Home  >  Article  >  Backend Development  >  Solution to garbled characters encountered when splitting GBK Chinese_PHP tutorial

Solution to garbled characters encountered when splitting GBK Chinese_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 14:59:541150browse

A string similar to the following (GBK), explode cannot get the correct result:

1.$result = explode("|", "Teng Huatao|Haiqing"); The reason is that for the character "韬" (pronounced tao, it doesn't matter if you don't know it, and I don't know it either), because of its GBK encoding The value is: 8f7c. Unfortunately, the ASCII value of “|” is also 7c.

There are many such problems: Because the encoding range of GBK encoding is: 0×8140-0xfefe, so, in theory, any word with a low byte of 7c will have this problem, such as:

1.倈(827c), billion(837c), 秧(b17c), 鴴(e57c)....etc. For such a situation,

1. First, we can use transcoding to utf8, then explode, and then convert back. This is a more troublesome method.
2. Second, we can use regular expressions to replace "separate" with "match out" Out ":
3.preg_match_all("/([/x81-/xfe][/x40-/xfe])+/", $gbk_str, $matches);//Written encoding like this, $matches contains 0 The array corresponding to the number index is the array of result words..

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/328124.htmlTechArticleA string similar to the following (GBK), explode cannot get the correct result: 1.$result = explode("| ", "Teng Huatao|Haiqing"); The reason is that for the word "韬" (pronounced tao), it doesn't matter if you don't know it, and I don't know it either...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn