Home  >  Article  >  Backend Development  >  Encountering BOM and <feff> encoding in PHP causes the json_decode function to be unable to parse the problem_PHP tutorial

Encountering BOM and <feff> encoding in PHP causes the json_decode function to be unable to parse the problem_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:26:321211browse

Yesterday, a colleague encountered a strange problem, that is, the following code cannot pass JSON verification, nor can it be parsed by PHP's json_decode function.

Copy code The code is as follows:

[
{
"title": "",
"pinyin": ""
}
]

You may be smart enough to guess that it contains invisible special characters, check it under vim:
Copy code The code is as follows:

[
{
          "title": "",
"pinyin": ""
}
]

It is found that there is a character in front of "title". If you have learned about BOM before, you should know that this special character is BOM. For its introduction, you can refer to another article: String encoding, garbled characters, BOM, etc. in computers Detailed explanation of the problem.


View the hexadecimal content of the file through the xxd command under Linux:

Copy code The code is as follows:

0000000: 5b 0a 20 20 20 20 7b 0a 20 20 20 20 20 20 20 20 [. {. 
0000010: ef bb bf 22 74 69 74 6c 65 22 3a 20 22 22 2c 0a ..."title": "",.
0000020: 20 20 20 20 20 20 20 20 22 70 69 6e 79 69 6e 22 "pinyin"
0000030: 3a 20 22 22 0a 20 20 20 20 7d 0a 5d 0a : "". }.].

You can see that the hexadecimal value of the special character in front of "title" is: ef bb bf, which is the BOM marked UTF-8. The meaning of BOM is as follows:
Copy code The code is as follows:

Starting byte Charset/encoding
EF BB BF UTF-8
FE FF UTF-16/UCS-2, little endian(UTF-16LE)
FF FE UTF-16/UCS-2, big endian(UTF-16BE)
FF FE 00 00 UTF-32/UCS-4, little endian.
00 00 FE FF UTF-32/UCS-4, big-endia

It is easy to solve the problem when you find it. Just search and delete the BOM. The BOM related commands under Linux are:

BOM operation of VIM

Copy code The code is as follows:

#Add BOM
:set bomb
#Delete BOM
:set nobomb
#Query BOM
:set bomb?

Find BOM in UTF-8 encoding

Copy code The code is as follows:
grep -I -r -l $'xEFxBBxBF' /path

You can also disable BOM submission in the svn hook (the following code is from the Internet and has not been verified)
Copy code The code is as follows:

#!/bin/sh

REPOS="$1"
TXN="$2"

SVNLOOK=/usr/bin/svnlook

FILES=`$SVNLOOK changed -t "$TXN" "$REPOS" | awk {'print $2'}`

for FILE in $FILES; do
CONTENT=`$SVNLOOK cat -t "$TXN" "$REPOS" "$FILE"`

if echo $CONTENT | head -c 3 | xxd -i | grep -q '0xef, 0xbb, 0xbf'; then
echo "BOM!" 1>&2
exit 1
fi
done


Finally, I would like to remind everyone that it is best not to use editors such as Notepad that will automatically add BOM to modify the code under wowdows, which may easily cause some problems.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/824672.htmlTechArticleYesterday a colleague encountered a strange problem, that is, the following code cannot pass JSON verification, nor can it pass PHP json_decode function analysis. Copy the code The code is as follows: [ { "title": "",...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn