Home  >  Article  >  Backend Development  >  PHP uses mb_substr() to solve the problem of Chinese string interception and garbled characters

PHP uses mb_substr() to solve the problem of Chinese string interception and garbled characters

伊谢尔伦
伊谢尔伦Original
2016-11-26 14:43:094015browse

PHP comes with several string interception functions, among which substr and mb_substr are commonly used. When the former processes Chinese, GBK is 2 length units and UTF is 3 length units. After the latter specifies encoding, one Chinese character is 1 length unit.

mb_substr Usage

string mb_substr( string$str, int$start[, int$length[, string$encoding]] );

mb_substr Perform a multibyte-safe substr() operation based on the number of characters. Calculated from the starting position of str. The first character is at position 0. The position of the second character is 1, and so on:

str is the intercepted parent string.

start starting position.

length The maximum length of the returned string. If omitted, it will be cut to the end of str.

encoding parameter is character encoding. If omitted, the internal character encoding is used.

Then we can use the following code to complete this problem.

$mess=mb_substr($message,0,19,'gb2312');

gb2312 is the Chinese encoding format.

mb_substr handles mixed Chinese and English strings

substr sometimes cuts off 1/3 Chinese or half Chinese and displays garbled characters. Relatively speaking, mb_substr is more suitable for us to use. But sometimes mb_substr doesn't seem so useful. For example, if I want to display the brief information of a small picture, 5 Chinese characters are just right. If there are more than 5 characters, just intercept the first 4 and add "...". This is no problem when processing Chinese, but when processing English or numbers, this interception will be too short. . This problem can be solved by using the following function:

<?php
/**
* 字符串截取
*
* @author gesion
* @param string $str 原始字符串
* @param int    $len 截取长度(中文/全角符号默认为 2 个单位,英文/数字为 1。
*                    例如:长度 12 表示 6 个中文或全角字符或 12 个英文或数字)
* @param bool   $dot 是否加点(若字符串超过 $len 长度,则后面加 "...")
* @return string
*/
class Onens {
   public static function g_substr($str, $len = 12, $dot = true) {
       $i = 0;
       $l = 0;
       $c = 0;
       $a = array();
       while ($l < $len) {
           $t = substr($str, $i, 1);
           if (ord($t) >= 224) {
               $c = 3;
               $t = substr($str, $i, $c);
               $l += 2;
           } elseif (ord($t) >= 192) {
               $c = 2;
               $t = substr($str, $i, $c);
               $l += 2;
           } else {
               $c = 1;
               $l++;
           }
           // $t = substr($str, $i, $c);
           $i += $c;
           if ($l > $len) break;
           $a[] = $t;
       }
       $re = implode(&#39;&#39;, $a);
       if (substr($str, $i, 1) !== false) {
           array_pop($a);
           ($c == 1) and array_pop($a);
           $re = implode(&#39;&#39;, $a);
           $dot and $re .= &#39;...&#39;;
       }
       return $re;
   }
}


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn