Home > Article > Backend Development > Solution to garbled characters encountered when splitting GBK Chinese_PHP tutorial
A string similar to the following (GBK), explode cannot get the correct result:
1.$result = explode("|", "Teng Huatao|Haiqing"); The reason is that for the character "韬" (pronounced tao, it doesn't matter if you don't know it, and I don't know it either), because of its GBK encoding The value is: 8f7c. Unfortunately, the ASCII value of “|” is also 7c.
There are many such problems: Because the encoding range of GBK encoding is: 0×8140-0xfefe, so, in theory, any word with a low byte of 7c will have this problem, such as:
1.倈(827c), billion(837c), 秧(b17c), 鴴(e57c)....etc. For such a situation,
1. First, we can use transcoding to utf8, then explode, and then convert back. This is a more troublesome method.
2. Second, we can use regular expressions to replace "separate" with "match out" Out ":
3.preg_match_all("/([/x81-/xfe][/x40-/xfe])+/", $gbk_str, $matches);//Written encoding like this, $matches contains 0 The array corresponding to the number index is the array of result words..