確定PHP 中UTF-8 字元的UCS-2 代碼點
目前的任務是提取UCS-2 代碼點對於給定UTF-8 字串中的字元。為此,可以定義自訂 PHP 函數。
首先,了解 UTF-8 編碼方案很重要。每個字元由 1 到 4 個位元組的序列表示,取決於其 Unicode 代碼點。每個位元組大小的範圍如下:
1110:3 位元組字元
11110:4 位元組字元<code class="php">function get_ucs2_codepoint($char) { // Initialize the code point $codePoint = 0; // Get the first byte $firstByte = ord($char); // Determine the number of bytes if ($firstByte < 128) { $bytes = 1; } elseif ($firstByte < 192) { $bytes = 2; } elseif ($firstByte < 224) { $bytes = 3; } elseif ($firstByte < 240) { $bytes = 4; } else { // Invalid character return -1; } // Shift and extract code point switch ($bytes) { case 1: $codePoint = $firstByte; break; case 2: $codePoint = ($firstByte & 0x1F) << 6; $codePoint |= ord($char[1]) & 0x3F; break; case 3: $codePoint = ($firstByte & 0x0F) << 12; $codePoint |= (ord($char[1]) & 0x3F) << 6; $codePoint |= ord($char[2]) & 0x3F; break; case 4: $codePoint = ($firstByte & 0x07) << 18; $codePoint |= (ord($char[1]) & 0x3F) << 12; $codePoint |= (ord($char[2]) & 0x3F) << 6; $codePoint |= ord($char[3]) & 0x3F; break; } return $codePoint; }</code>
<code class="php">$char = "ñ"; $codePoint = get_ucs2_codepoint($char); echo "UCS-2 code point: $codePoint\n";</code>
10:連續位元組
UCS-2 code point: 24111111:無效字元一旦確定了位元組數,就可以使用位元操作來擷取程式碼點。 自訂PHP 函數:基於根據上述分析,這裡有一個自訂PHP 函數,它接受單一UTF-8 字元作為輸入並傳回其UCS- 2 程式碼點:範例用法: 要使用函數,只需提供UTF-8 字元作為輸入:輸出:
以上是如何在 PHP 中從 UTF-8 字元中提取 UCS-2 代碼點?的詳細內容。更多資訊請關注PHP中文網其他相關文章!