Home > Article > Backend Development > utf-8 - character encoding in php
<code>$str1 = "\xe4\xb8\xad"; $str2 = '\xe4\xb8\xad'; $str3 = '中';</code>
Can you explain in detail the difference between the three and whether they can be converted into each other
<code>$str1 = "\xe4\xb8\xad"; $str2 = '\xe4\xb8\xad'; $str3 = '中';</code>
Can you explain in detail the difference between the three and whether they can be converted into each other
First time answering a question on segmentfault. .
PHP string variables, double quotes and single quotes have different meanings
Escape when using double quotes. Not escape when using single quotes
When using double quotes, the $xxxx text will be replaced by the value of the corresponding variable. Single quotes have no such effect
Eg.
<code class="php">$abc='123'; echo "$abc"; //这样会输出123 echo '$abc'; //这样会输出$abc echo "\n"; //这样会输出一个换行符 echo '\n'; //这样会输出\n两个字符(一个斜杠一个n)</code>
Back to the question,
The hexadecimal encoding of the Chinese character "中" in UTF-8 is 0xe4, 0xb8, 0xad
So in a double-quoted string, it will be escaped as "中". The beginning of x means that this is a string starting with The characters expressed in hexadecimal are the same as &xe4; in HTML
In a single quote string, xe4xb8xad is directly output
If your environment encoding is under UTF-8, str1 and str3 are equivalent. If you echo directly, "medium" will be output. If it is a three-byte comparison at the binary level, it is also completely equal. Strings in PHP are directly Locally encoded binary storage
If your environment encoding is non-UTF-8 (such as GBK), str1 is basically a garbled code, and str1 and str3 are no longer equivalent
As for str2, it will output 'xe4xb8xad' at any time (without quotation marks. In a single-quoted string, only the single quotation mark itself needs to be escaped to ', otherwise it will be treated as an ordinary character.
Only explain the difference between the first and the second, that is, the difference between single quotes and double quotes
Double quotes: The quotes inside will be escaped
Single quotes: The quotes inside will not be escaped
<code class="php">$a = 123; echo "output:$a";//output:123 echo 'output:$a';//output:$a //下面的示例仅限linux的php-cli echo "new line\nsecond line"; /* 会换行,输出: new line second line */ echo 'no new line\n aaa'; /* 不会换行,输出: no new line\n aaa */</code>
Escaping works, nothing else works
.
PHP itself does not distinguish character encodings. In other words,
$str1 is a three-byte string, and the three bytes of the string are (hexadecimal encoding)
E4 B8 AD. If it is in UTF-8 encoding, it is the character
in
And
$str2 is a 12-byte string, which is the characters you entered.
And
$str3
$str1. If you save in GBK, it is two bytes
D6 D0
A4 A4.
Whether it is UTF-8, GBK or BIG5, or even many other language encodings, all follow EUC, which means that for ASCII characters, their encodings are consistent, so no matter which encoding is used to save, it will not affect PHP. Your code work will not be affected. But there is a big difference for non-ASCII characters.
So in order for non-ASCII characters in PHP to be displayed normally, you must ensure that your saving encoding and output encoding are consistent. If the output is HTML, the encoding is declared through the meta
tag or in the HTTP Header. If they are inconsistent, garbled characters will appear.