Home  >  Article  >  Backend Development  >  PHP Chinese string interception PHP interception Chinese string code

PHP Chinese string interception PHP interception Chinese string code

巴扎黑
巴扎黑Original
2016-11-22 11:44:481101browse

When PHP intercepts Chinese strings, it generally determines whether it is a double-byte character based on whether the value is greater than or equal to 128 to avoid incomplete interception and garbled characters.
However, when encountering a situation where Chinese and English are mixed and special symbols are included, the problem is not so easy to solve.

The following is a function that comprehensively solves the problem of Chinese string interception. Friends in need can refer to it.

Explanation:
1. The len parameter is based on Chinese characters. 1len is equal to 2 English characters. In order to make the format look better
2. If the magic parameter is set to false, Chinese and English will be treated equally, and the absolute number of characters will be taken.
3. Especially suitable for strings encoded with htmlspecialchars()
4. Can correctly handle the entity character mode in GB2312 (??)

Example:
/**
@Intercept Chinese string suitable for GB2312 encoding
@http://www.jbxue.com
*/
function FSubstr($title,$start,$len="",$magic=true)
{
$length = 0;
if($len == "") $len = strlen($title);

/ /Judge the starting position as incorrect
if($start > 0)
{
$cnum = 0;
for($i=0;$i<$start;$i++)
{
if(ord(substr ($title,$i,1)) >= 128) $cnum ++;
}
if($cnum%2 != 0) $start--;

unset($cnum);
}

if(strlen($title)<=$len) return substr($title,$start,$len);

$alen = 0;
$blen = 0;

$realnum = 0;

for( $i=$start;$i{
$ctype = 0;
$cstep = 0;
$cur = substr($title,$i,1);
if( $cur == "&")
{
if(substr($title,$i,4) == "<")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,4) == ">")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($ title,$i,5) == "&")
{
$cstep = 5;
$length += 5;
$i += 4;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,6) == """)
{
$cstep = 6;
$length += 6;
$i += 5 ;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,6) == "'")
{
$cstep = 6;
$length += 6;
$i += 5;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(preg_match("/& #(d+);/i",substr($title,$i,8),$match))
{
$cstep = strlen($match[0]);
$length += strlen($match[0 ]);
$i += strlen($match[0])-1;
$realnum ++;
if($magic)
{
$blen ++;
$ctype = 1;
}
}
}else{
if(ord($cur)>=128)
{
$cstep = 2;
$length += 2;
$i += 1;
$realnum ++;
if($magic)
{
$blen ++;
$ctype = 1;
}
}else{
$cstep = 1;
$length +=1;
$realnum ++;
if($magic)
{
$alen++ ;
}
}
}

if($magic)
{
if(($blen*2+$alen) == ($len*2)) break;
if(($blen*2+$alen ) == ($len*2+1))
{
if($ctype == 1)
{
$length -= $cstep;
break;
}else{
break;
}
}
}else {
if($realnum == $len) break;
}
}

unset($cur);
unset($alen);
unset($blen);
unset($realnum);
unset($ ctype);
unset($cstep);

return substr($title,$start,$length);
}
?>

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:php insertion sortNext article:php insertion sort