Home  >  Article  >  Backend Development  >  Parsing the problem of using substr to intercept UTF-8 Chinese strings and causing garbled characters_PHP tutorial

Parsing the problem of using substr to intercept UTF-8 Chinese strings and causing garbled characters_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:04:551037browse

We know that sometimes when substr is used to intercept UTF-8 Chinese strings, garbled characters often appear. Why does such a problem occur? This article tells you the answer.
Look at this piece of code (character encoding is UTF-8):

Copy code Code As follows:

$str = 'Everyone knows that strlen and mb_strlen are functions to find the length of a string';
echo strlen($str)'.
'.mb_strlen($str,'utf-8');
?>

Run the above code and the return value is as follows:
66
34
How about it? In strlen, Chinese is three bytes in length, and English is one byte in length! In mb_strlen, they are all calculated as the length of one byte! Therefore, when we sometimes use substr to intercept UTF-8 Chinese strings, garbled characters often appear. This is the reason!
The following provides a function to intercept UTF-8 strings:
Copy the code The code is as follows:

function cutstr( $sourcestr,$cutlength){
$returnstr = '';
$i = 0;
$n = 0;
$str_length = strlen($sourcestr);
$mb_str_length = mb_strlen($sourcestr,'utf-8');
while(($n < $cutlength) && ($i <= $str_length)){
$temp_str = substr($sourcestr,$i ,1);
$ascnum = ord($temp_str);
if($ascnum >= 224){
$returnstr = $returnstr.substr($sourcestr,$i,3);
$i = $i + 3;
$n++;
}
elseif($ascnum >= 192){
$returnstr = $returnstr.substr($sourcestr,$i, 2);
$i = $i + 2;
$n++;
}
elseif(($ascnum >= 65) && ($ascnum <= 90)){
$returnstr = $returnstr.substr($sourcestr,$i,1);
$i = $i + 1;
$n++;
}
else{
$returnstr = $returnstr.substr($sourcestr,$i,1);
$i = $i + 1;
$n = $n + 0.5;
}
}
if ($ mb_str_length > $cutlength){
$returnstr = $returnstr . "...";
}
return $returnstr;
}

Usage example:
Copy code The code is as follows:

$str = 'The validity period is up to three months, beyond the validity period The system will automatically delete this message';
//echo strlen($str);
//echo '
'.mb_strlen($str,'utf-8');
echo '
'.$str;
echo '
'.cutstr($str,24);
?>

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/327744.htmlTechArticleWe know that sometimes when substr is used to intercept UTF-8 Chinese strings, garbled characters often appear. Why? If such a question arises, this article will tell you the answer. Look at this piece of code...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn