確定PHP 中UTF-8 字元的UCS-2 代碼點

目前的任務是提取UCS-2 代碼點對於給定UTF-8 字串中的字元。為此,可以定義自訂 PHP 函數。

首先,了解 UTF-8 編碼方案很重要。每個字元由 1 到 4 個位元組的序列表示,取決於其 Unicode 代碼點。每個位元組大小的範圍如下:

  • 要確定每個字元的位元組數,請檢查第一位元組:
  • 要確定每個字元的位元組數,請檢查第一位元組:
0:1 個字節字元

110:2 位元組字元

1110:3 位元組字元

11110:4 位元組字元

<code class="php">function get_ucs2_codepoint($char)
    // Initialize the code point
    $codePoint = 0;

    // Get the first byte
    $firstByte = ord($char);

    // Determine the number of bytes
    if ($firstByte < 128) {
        $bytes = 1;
    } elseif ($firstByte < 192) {
        $bytes = 2;
    } elseif ($firstByte < 224) {
        $bytes = 3;
    } elseif ($firstByte < 240) {
        $bytes = 4;
    } else {
        // Invalid character
        return -1;

    // Shift and extract code point
    switch ($bytes) {
        case 1:
            $codePoint = $firstByte;
        case 2:
            $codePoint = ($firstByte & 0x1F) << 6;
            $codePoint |= ord($char[1]) & 0x3F;
        case 3:
            $codePoint = ($firstByte & 0x0F) << 12;
            $codePoint |= (ord($char[1]) & 0x3F) << 6;
            $codePoint |= ord($char[2]) & 0x3F;
        case 4:
            $codePoint = ($firstByte & 0x07) << 18;
            $codePoint |= (ord($char[1]) & 0x3F) << 12;
            $codePoint |= (ord($char[2]) & 0x3F) << 6;
            $codePoint |= ord($char[3]) & 0x3F;

    return $codePoint;

<code class="php">$char = "ñ";
$codePoint = get_ucs2_codepoint($char);
echo "UCS-2 code point: $codePoint\n";</code>


UCS-2 code point: 241
