Home > Article > Backend Development > Problem with intercepting Chinese strings with PHP_PHP tutorial
The following code is used for GB2312 encoding. Intercepting Chinese strings is a headache in PHP. The solution is to determine whether it is a double-byte character based on whether the value is greater than or equal to 128 to avoid garbled characters.However, there are always problems such as mixing Chinese and English, special symbols, etc. Now I will write a more comprehensive one for reference only:
Program description:
1. The len parameter is based on Chinese characters, 1len Equal to 2 English characters, in order to look better in form
2. If the magic parameter is set to false, Chinese and English will be treated equally, and the absolute number of characters will be taken
3. Especially suitable for Strings encoded with htmlspecialchars()
4. Can correctly handle the entity character mode in GB2312 ()
Program code:
function FSubstr($title, $start,$len="",$magic=true)
{
/**
* powered by Smartpig
* mailto:d.einstein@263.net
*/
$length = 0;
if($len == " ") $len = strlen($title);
//Judge the starting position as an incorrect position
if($start > 0)
{
$cnum = 0;
for($i=0;$i<$start;$i++)
{
if(ord(substr($title,$i,1)) >= 128) $cnum ++;
}
if($cnum%2 != 0) $start--;
unset($cnum);
}
if(strlen($title )<=$len) return substr($title,$start,$len);
$alen = 0;
$blen = 0;
$realnum = 0;
for($i=$start;$i
$ctype = 0;
$cstep = 0;
$ cur = substr($title,$i,1);
if($cur == "&")
{
if(substr($title,$i,4) == "< ")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,4) == ">")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,5) == "&")
{
$cstep = 5;
$length += 5;
$i += 4;
$realnum ++;
if($magic)
{
$alen++;
}
}
else if (substr($title,$i,6) == """)
{
$cstep = 6;
$length += 6;
$i += 5;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,6) = = "'")
{
$cstep = 6;
$length += 6;
$i += 5;
$realnum ++;
if($magic )
{
$alen ++;
}
}
else if(preg_match("/(d+);/i",substr($title,$i,8 ),$match))
{
$cstep = strlen($match[0]);
$length += strlen($match[0]);
$i += strlen( $match[0])-1;
$realnum ++;
if($magic)
{
$blen ++;
$ctype = 1;
}
}
}else{
if(ord($cur)>=128)
{
$cstep = 2;
$length += 2;
$i += 1;
$realnum ++;
if($magic)
{
$blen ++;
$ctype = 1;
}
}else{
$cstep = 1;
$length +=1;
$realnum ++;
if($magic)
{
$alen++;
}
}
}
if($magic)
{
if(($blen*2+$alen) == ($len*2)) break;
if(($blen*2+$alen) == ($len*2+1))
{
if($ctype == 1)
{
$length -= $cstep;
break;
}else{
break;
}
}
}else{
if($realnum == $len) break;
}
}
unset($cur);
unset($alen);
unset($blen);
unset($realnum);
unset($ctype);
unset($cstep);
return substr($title,$start,$length);
}