Home  >  Article  >  Backend Development  >  Analysis of the specific method of processing Chinese strings with PHP string mbstring_PHP tutorial

Analysis of the specific method of processing Chinese strings with PHP string mbstring_PHP tutorial

WBOY
WBOYOriginal
2016-07-15 13:30:20866browse

By understanding The coexistence of multiple languages ​​means multi-bytes. PHP’s built-in string length function strlen cannot correctly handle Chinese strings. It only gets the characters occupied by the string. Number of sections. For GB2312 Chinese encoding, the value obtained by strlen is twice the number of Chinese characters, while for UTF-8 encoded Chinese, the difference is 1 to 3 times.

Using PHP string mbstring can better solve this problem. The usage of mb_strlen is similar to strlen, except that it has a second optional parameter to specify the character encoding. For example, to get the length of the UTF-8 string $str, you can use mb_strlen($str,’UTF-8′). If the second parameter is omitted, PHP's internal encoding will be used. The internal encoding can be obtained through the mb_internal_encoding() function. There are two ways to set it:

1. Set mbstring.internal_encoding = UTF-8 in php.ini
2. Call mb_internal_encoding("GBK")

In addition to the PHP string mbstring, there are many cutting functions, among which mb_substr splits characters by words, and mb_strcut splits characters by bytes, but neither of them will produce half a character. Moreover, cutting from functions has different effects on length. The cutting condition of mb_strcut is less than strlen, and mb_substr is equal to strlen. See the example below.

<ol class="dp-xml">
<li class="alt"><span><span class="tag"><</span><span> ?  </span></span></li><li><span>$</span><span class="attribute">str</span><span> = &lsquo;我是一串比较长的中文-www.jefflei.com&rsquo;;  </span></li><li class="alt"><span>echo &ldquo;mb_substr:&rdquo; . mb_substr($str, 0, 6, &lsquo;utf-8&prime;);  </span></li><li><span>echo &rdquo;  </span></li><li class="alt"><span>&ldquo;;  </span></li><li><span>echo &ldquo;mb_strcut:&rdquo; . mb_strcut($str, 0, 6, &lsquo;utf-8&prime;);  </span></li><li class="alt"><span class="tag">?></span><span> </span></span></li>
<li><span> </span></li>
</ol>

The output is as follows:
mb_substr: I am a string Compare
mb_strcut: I am

It should be noted that the PHP string mbstring is not a PHP core function. Before use, you need to ensure that mbstring support is added when compiling the module in PHP:
(1) Compile Use –enable-mbstring
(2) to modify /usr/local/lib/php.inc
default_charset = “zh-cn”
mbstring.language = zh-cn
mbstring.internal_encoding = zh-cn

The PHP string mbstring class library has a lot of content, and also includes email processing functions such as mb_ send_ mail, etc.


www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/446311.htmlTechArticleThe coexistence of multiple languages ​​means multi-bytes, and PHP’s built-in string length function strlen cannot handle it correctly. Chinese string, it only gets the number of bytes occupied by the string. Yes...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn