Home  >  Article  >  Backend Development  >  Solution to php splitting GBK Chinese garbled characters

Solution to php splitting GBK Chinese garbled characters

WBOY
WBOYOriginal
2016-07-25 09:00:111061browse
When PHP splits Chinese strings in gbk encoding format, garbled characters are prone to appear. How to solve this problem? Friends in need, please refer to the introduction in this article.

For a string similar to the following (GBK), explode cannot get the correct result: $result = explode("|", "Teng Huatao|Haiqing");

The reason is that for the character "韬" (pronounced tao, it doesn't matter if you don't know it, and neither do I), because its GBK encoding value is: 8f7c. Unfortunately, the ASCII value of "|" is also 7c.

There are some similar problems: Because the encoding range of GBK encoding is: 0×8140-0xfefe, so, theoretically, any word with a low byte of 7c will have this problem, such as: 倈(827c), 100 million(837c), 禧(b17c), 鴴(e57c)...and so on

For this situation, First, you can use transcoding to utf8, then explode, and then convert back. This is a more troublesome method. Second, we can use regular expressions to replace "separate" with "match out": preg_match_all("/([/x81-/xfe][/x40-/xfe])+/", $gbk_str, $matches);//hard-coded encoding In this way, the array corresponding to index 0 in $matches is the array of result words..

The above is the solution to the php GBK encoding problem. I wonder if it can solve your problem. Welcome to communicate with everyone.



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn