Home >Backend Development >PHP Tutorial >How can I decode Unicode escape sequences in PHP?

How can I decode Unicode escape sequences in PHP?

Linda Hamilton
Linda HamiltonOriginal
2024-12-23 12:02:10711browse

How can I decode Unicode escape sequences in PHP?

Decoding Unicode Escape Sequences in PHP

If you're dealing with Unicode escape sequences like "u00ed" in PHP, you'll need a way to decode them into proper UTF-8 encoded characters. The preg_replace_callback() function provides a solution to this problem.

To decode Unicode escape sequences using preg_replace_callback(), follow these steps:

$str = preg_replace_callback('/\\u([0-9a-fA-F]{4})/', function ($match) {
    return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}, $str);

This regular expression pattern matches Unicode escape sequences like "u00ed" and captures the hexadecimal code point within the parentheses. The matching callback function then uses the pack() and mb_convert_encoding() functions to convert the hexadecimal code point into its corresponding UTF-8 character.

If you're working with C/C /Java/Json-style UTF-16 based escape sequences, you can use a slightly different version of the regular expression:

$str = preg_replace_callback('/\\u([0-9a-fA-F]{4})/', function ($match) {
    return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UTF-16BE');
}, $str);

By specifying 'UTF-16BE' as the encoding for mb_convert_encoding(), you ensure that the UTF-16 big-endian format is correctly decoded into UTF-8.

The above is the detailed content of How can I decode Unicode escape sequences in PHP?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn