Home  >  Article  >  Backend Development  >  PHP cuts Chinese characters without garbled characters

PHP cuts Chinese characters without garbled characters

王林
王林Original
2019-09-17 13:03:413833browse

PHP cuts Chinese characters without garbled characters

In PHP, if the substr() function intercepts a Chinese string, garbled characters may appear. This is because the number of bytes occupied by one byte in Chinese and Western characters is different. The length parameter of substr is calculated in bytes. In GB2312 encoding, one Chinese character occupies 2 bytes and English occupies 1 byte. In UTF-8 encoding, one Chinese character may occupy 2 or 3 bytes. Bytes, English or half-width punctuation occupies 1 byte.

Directly using the PHP function substr to intercept Chinese characters may cause garbled characters. The main reason is that substr may forcibly "saw" a Chinese character in half. Solution:

1. Use the mb_substr interception of the mbstring extension library to avoid garbled characters.

2. Write the interception function yourself, but the efficiency is not as high as using the mbstring extension library.

3. If it is just to output the intercepted string, it can be implemented in the following way: substr($str, 0, 30).chr(0).

substr()The function can split text, but if the text to be split includes Chinese characters, you will often encounter problems. In this case, you can use mb_substr()/mb_strcutThe usage of this function, mb_substr()/mb_strcut is similar to substr(), except that one more parameter is added at the end of mb_substr()/mb_strcut to set the encoding of the string, but generally The server has not opened php_mbstring.dll. You need to open php_mbstring.dll in php.ini.

For example:

<?php
echo mb_substr(&#39;这样一来我的字符串就不会有乱码^_^&#39;, 0, 7, &#39;utf-8&#39;);
?>
输出:这样一来我的字
<?php
echo mb_strcut(&#39;这样一来我的字符串就不会有乱码^_^&#39;, 0, 7, &#39;utf-8&#39;);
?>

Output: Like this

As can be seen from the above example, mb_substr divides characters by words, while mb_strcut divides characters by bytes To segment characters, but it will not produce half a character.

PHP method to intercept Chinese string without garbled characters

function GBsubstr($string, $start, $length) {
if(strlen($string)>$length){
  $str=null;
  $len=$start+$length;
  for($i=$start;$i<$len;$i++){
  if(ord(substr($string,$i,1))>0xa0){
   $str.=substr($string,$i,2);
   $i++;
  }else{
   $str.=substr($string,$i,1);
  }
  }
  return $str.&#39;...&#39;;
}else{
  return $string;
}
}

Method to implement Chinese string interception without garbled characters--applicable to utf- 8

function substr_text($str, $start=0, $length, $charset="utf-8", $suffix="")
{
if(function_exists("mb_substr")){
return mb_substr($str, $start, $length, $charset).$suffix;
}
elseif(function_exists(&#39;iconv_substr&#39;)){
return iconv_substr($str,$start,$length,$charset).$suffix;
}
$re[&#39;utf-8&#39;] = "/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf]{2}|[\xf0-\xff][\x80-\xbf]{3}/";
$re[&#39;gb2312&#39;] = "/[\x01-\x7f]|[\xb0-\xf7][\xa0-\xfe]/";
$re[&#39;gbk&#39;]  = "/[\x01-\x7f]|[\x81-\xfe][\x40-\xfe]/";
$re[&#39;big5&#39;]  = "/[\x01-\x7f]|[\x81-\xfe]([\x40-\x7e]|\xa1-\xfe])/";
preg_match_all($re[$charset], $str, $match);
$slice = join("",array_slice($match[0], $start, $length));
return $slice.$suffix;
}

Recommended tutorial: PHP video tutorial

The above is the detailed content of PHP cuts Chinese characters without garbled characters. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn