Home >Backend Development >PHP Tutorial >Interception of question marks in Chinese character strings in php_PHP tutorial

Interception of question marks in Chinese character strings in php_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:44:101148browse

Using PHP’s built-in function to intercept Chinese characters, sometimes you will encounter question marks. Below we have compiled a few very good examples of accurate interception of Chinese characters.

The problems with php when operating strings are nothing more than two problems:
1. Determine whether the string encoding is gbk or unicode.
2. Adopt corresponding interception methods for the corresponding codes.

Generally, when we use substr to intercept Chinese characters, we may encounter garbled characters. Because Chinese characters are double-byte, when one byte is intercepted, the Chinese character cannot be displayed and is messed up.

In fact, the solution is very simple, look at the interception function below:

The code is as follows Copy code
 代码如下 复制代码

//截取超长字符串
function curtStr($str,$len=30){
 if(strlen($str)>$len){
  $str = substr($str,0,$len);
  $str .= chr(0) ."…";
 return $str;
}

//Intercept super long string

function curtStr($str,$len=30){
if(strlen($str)>$len){
$str = substr($str,0,$len);
$str .= chr(0) ."…";
return $str;

}

The chr(0) above is not null

null means nothing, and the value of chr(0) is 0. Expressed in hexadecimal it is 0×00, expressed in binary it is 00000000

Although chr(0) does not display anything, it is a character.
 代码如下 复制代码

   //截取utf8字符串
function utf8Substr($str, $from, $len)
{
return preg_replace('#^(?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$from.'}'.
'((?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$len.'}).*#s',
'',$str);
}
?>

When a Chinese character is truncated, according to the encoding rules, he always has to pull the other characters behind as Chinese characters for interpretation. This is the reason why garbled characters appear. The combination of values ​​0×81 to 0xff and 0×00 is always displayed as “empty” According to this feature, adding a chr(0) after the result of substr can prevent garbled characters The following functions can be added to achieve these two points to achieve the purpose of accurately intercepting Chinese strings: Intercept utf8 encoded multi-byte string
The code is as follows Copy code
//Intercept utf8 string <🎜> function utf8Substr($str, $from, $len) <🎜> { <🎜> Return preg_replace('#^(?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$from.'}'. <🎜>                         '((?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$len.'}).*#s', <🎜> ‘$1’,$str); <🎜> }  <🎜> ?>

Chinese character interception function supported by UTF-8 and GB2312

The code is as follows Copy code
/*
Chinese character interception function supported by Utf-8 and gb2312
cut_str(string, cut length, starting length, encoding);
The encoding defaults to utf-8
The starting length defaults to 0
*/ 

Function cut_str($string, $sublen, $start = 0, $code = 'UTF-8')

If($code == 'UTF-8')
                                                                          $pa = "/[x01-x7f]|[xc2-xdf][x80-xbf]|xe0[xa0-xbf][x80-xbf]|[xe1-xef][x80-xbf][x80-xbf]| xf0[x90-xbf][x80-xbf][x80-xbf]|[xf1-xf7][x80-xbf][x80-xbf][x80-xbf]/"; 
Preg_match_all($pa, $string, $t_string);

If(count($t_string[0]) - $start > $sublen) return join('', array_slice($t_string[0], $start, $sublen))."...";
               return join('', array_slice($t_string[0], $start, $sublen));                                                                                                                                                                                                                                                                                     $start = $start*2;                                                               $sublen = $sublen*2;                                                            $strlen = strlen($string);                                                         $tmpstr = '';                                       
for($i=0; $i<$strlen; $i++)
                                                                          If($i>=$start && $i<($start+$sublen)) If($i>=$start && $i<($start+$sublen))                                                                        If(ord(substr($string, $i, 1))>129)
                                                                                                                                                                                                                                                                                                                                                                through $tmpstr.= substr($string, $i, 2);                                                                                                       else                                                                                          $tmpstr.= substr($string, $i, 1);
                                                                                                                                                                                                 If(ord(substr($string, $i, 1))>129) $i++;
                                                                       If(strlen($tmpstr)<$strlen ) $tmpstr.= "...";
              return $tmpstr;                                                                                                                        }  

$str = "abcd string that needs to be intercepted";
echo cut_str($str, 8, 0, 'gb2312');
?>

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/633112.htmlTechArticleChinese characters sometimes encounter question marks when using PHP’s built-in function to intercept them. Below we have compiled some very good ones Examples of precise interception of Chinese characters. There is no problem with php operating strings...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn