Home  >  Article  >  Backend Development  >  Problem with intercepting Chinese strings with PHP_PHP tutorial

Problem with intercepting Chinese strings with PHP_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 16:12:16737browse

The following code is used for GB2312 encoding. Intercepting Chinese strings is a headache in PHP. The solution is to determine whether it is a double-byte character based on whether the value is greater than or equal to 128 to avoid garbled characters.However, there are always problems such as mixing Chinese and English, special symbols, etc. Now I will write a more comprehensive one for reference only:

Program description:

1. The len parameter is based on Chinese characters, 1len Equal to 2 English characters, in order to look better in form

2. If the magic parameter is set to false, Chinese and English will be treated equally, and the absolute number of characters will be taken

3. Especially suitable for Strings encoded with htmlspecialchars()

4. Can correctly handle the entity character mode in GB2312 (𖰰)

Program code:
function FSubstr($title, $start,$len="",$magic=true)
{
/**
  *  powered by Smartpig
  *  mailto:d.einstein@263.net
  */

$length = 0;
if($len == " ") $len = strlen($title);

//Judge the starting position as an incorrect position
if($start > 0)
{
$cnum = 0;
for($i=0;$i<$start;$i++)
{
if(ord(substr($title,$i,1)) >= 128) $cnum ++;
}
if($cnum%2 != 0) $start--;

unset($cnum);
}

if(strlen($title )<=$len) return substr($title,$start,$len);

$alen = 0;
$blen = 0;

$realnum = 0;

for($i=$start;$i{
$ctype = 0;
$cstep = 0;
$ cur = substr($title,$i,1);
if($cur == "&")
{
if(substr($title,$i,4) == "< ")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,4) == ">")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,5) == "&")
{
$cstep = 5;
$length += 5;
$i += 4;
$realnum ++;
if($magic)
{
$alen++;
}
}
else if (substr($title,$i,6) == """)
{
$cstep = 6;
$length += 6;
$i += 5;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,6) = = "'")
{
$cstep = 6;
$length += 6;
$i += 5;
$realnum ++;
if($magic )
{
$alen ++;
}
}
else if(preg_match("/(d+);/i",substr($title,$i,8 ),$match))
{
$cstep = strlen($match[0]);
$length += strlen($match[0]);
$i += strlen( $match[0])-1;
$realnum ++;
if($magic)
{
$blen ++;
$ctype = 1;
}
}
}else{
   if(ord($cur)>=128)
   {
    $cstep = 2;
    $length += 2;
    $i += 1;
    $realnum ++;
    if($magic)
    {
     $blen ++;
     $ctype = 1;
    }
   }else{
    $cstep = 1;
    $length +=1;
    $realnum ++;
    if($magic)
    {
     $alen++;
    }
   }
  }

  if($magic)
  {
   if(($blen*2+$alen) == ($len*2)) break;
   if(($blen*2+$alen) == ($len*2+1))
   {
    if($ctype == 1)
    {
     $length -= $cstep;
     break;
    }else{
     break;
    }
   }
  }else{
   if($realnum == $len) break;
  }
}

unset($cur);
unset($alen);
unset($blen);
unset($realnum);
unset($ctype);
unset($cstep);

return substr($title,$start,$length);


www.bkjia.comtruehttp://www.bkjia.com/PHPjc/313733.htmlTechArticle以下代码试用于GB2312编码,截取中文字符串是PHP中一个头疼的问题,解决方法是根据值是否大于等于128来判断是否是双字节字符,以避免出...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn