Home >Backend Development >PHP Tutorial >Why Do Special Unicode Characters Appear Distorted After JSON Encoding?

Why Do Special Unicode Characters Appear Distorted After JSON Encoding?

Barbara Streisand
Barbara StreisandOriginal
2024-12-10 18:03:17926browse

Why Do Special Unicode Characters Appear Distorted After JSON Encoding?

Interpreting "Special" Unicode Characters Encoded as JSON

When incorporating "special" Unicode characters, they often appear distorted after being encoded into JSON:

echo json_encode(['foo' => '馬']);
// Output: {"foo":"\u99ac"}

Understanding why this occurs is crucial.

JSON Encoding Standard

JSON encoding leverages the ECMAScript (formerly known as JavaScript) string literal formation (Section 7.8.4). It allows characters to be represented as hexadecimal numbers prefixed with "u", followed by four hexadecimal digits representing the code point:

"\u99ac"

This representation, identical to the string literal "馬", conveys the same Unicode character when parsed by a compliant JSON parser.

PHP's JSON Encoding Preference

PHP's json_encode function often encodes non-ASCII characters using "u...." escape sequences. While this is optional, it produces valid JSON.

Customizing Encoding

If desired, the JSON_UNESCAPED_UNICODE flag, introduced in PHP 5.4, allows for literal character encoding:

echo json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE);
// Output: {"foo":"馬"}

It's important to note that this customization is a preference rather than a necessity for transmitting Unicode characters in JSON.

The above is the detailed content of Why Do Special Unicode Characters Appear Distorted After JSON Encoding?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn