Home  >  Article  >  Backend Development  >  How can I convert a UTF-8 string to UCS-2 code points in PHP 4 or 5?

How can I convert a UTF-8 string to UCS-2 code points in PHP 4 or 5?

Linda Hamilton
Linda HamiltonOriginal
2024-10-30 18:04:31514browse

How can I convert a UTF-8 string to UCS-2 code points in PHP 4 or 5?

Getting UCS-2 Code Points for UTF-8 Strings in PHP 4 or 5

To obtain UCS-2 code points for a UTF-8 string, you can leverage existing utilities available in PHP. Consider using libraries like iconv to facilitate this conversion.

In case you prefer a custom solution, it's crucial to understand the UTF-8 format. Each code point is stored as 1-4 bytes, based on its value. The following ranges apply:

  • 1 byte: 0xxxxxxx
  • 2 bytes: 110xxxxx 10xxxxxx
  • 3 bytes: 1110xxxx 10xxxxxx 10xxxxxx
  • 4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

To determine the number of bytes in a character, examine the first byte. A 0 prefix indicates a 1-byte character, 110 indicates 2 bytes, 1110 a 3-byte character, and 11110 a 4-byte character.

Once you know the character's size, you can perform bitwise operations to convert it. Note that UCS-2 cannot represent characters above U FFFF.

For reference, here's a PHP 4 or 5 function that you can use:

<code class="php">function get_ucs2_codepoint($char)
{
    $byte = ord($char);
    if ($byte < 128) {
        return $byte;
    } elseif ($byte < 224) {
        return (($byte & 63) << 6) | (ord($char[1]) & 63);
    } elseif ($byte < 240) {
        return (($byte & 31) << 12) | ((ord($char[1]) & 63) << 6) | (ord($char[2]) & 63);
    } else {
        return 0; // UCS-2 cannot handle code points this high
    }
}</code>

Remember, this function doesn't handle all Unicode characters, only those representable with UCS-2. If you need to handle full Unicode, you should use alternative libraries or PHP 6 functions.

The above is the detailed content of How can I convert a UTF-8 string to UCS-2 code points in PHP 4 or 5?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn