Home >Backend Development >PHP Tutorial >How can I decode Unicode escape sequences in PHP?
Decoding Unicode Escape Sequences in PHP
If you're dealing with Unicode escape sequences like "u00ed" in PHP, you'll need a way to decode them into proper UTF-8 encoded characters. The preg_replace_callback() function provides a solution to this problem.
To decode Unicode escape sequences using preg_replace_callback(), follow these steps:
$str = preg_replace_callback('/\\u([0-9a-fA-F]{4})/', function ($match) { return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE'); }, $str);
This regular expression pattern matches Unicode escape sequences like "u00ed" and captures the hexadecimal code point within the parentheses. The matching callback function then uses the pack() and mb_convert_encoding() functions to convert the hexadecimal code point into its corresponding UTF-8 character.
If you're working with C/C /Java/Json-style UTF-16 based escape sequences, you can use a slightly different version of the regular expression:
$str = preg_replace_callback('/\\u([0-9a-fA-F]{4})/', function ($match) { return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UTF-16BE'); }, $str);
By specifying 'UTF-16BE' as the encoding for mb_convert_encoding(), you ensure that the UTF-16 big-endian format is correctly decoded into UTF-8.
The above is the detailed content of How can I decode Unicode escape sequences in PHP?. For more information, please follow other related articles on the PHP Chinese website!