Home  >  Article  >  Backend Development  >  PHP intercepts Chinese strings without garbled characters (ord(), substr() functions)

PHP intercepts Chinese strings without garbled characters (ord(), substr() functions)

WBOY
WBOYOriginal
2016-07-25 08:56:531097browse
This article introduces two functions for intercepting Chinese strings without garbled characters in PHP, namely the ord() function and the substr() function. Friends in need can refer to it.

Note in PHP programming: According to the UTF-8 encoding specification, 3 consecutive characters are counted as a single character.

Let’s look at a piece of code that intercepts Chinese strings, as follows:

<?php
function cut_str($sourcestr,$cutlength)  
{  
   $returnstr='';  
   $i=0;  
   $n=0;  
   $str_length=strlen($sourcestr);//字符串的字节数  
   while (($n<$cutlength) and ($i<=$str_length))  
   {  
      $temp_str=substr($sourcestr,$i,1);  
      $ascnum=Ord($temp_str);//得到字符串中第$i位字符的ascii码  
      if ($ascnum>=224)    //如果ASCII位高与224,  
      {  
$returnstr=$returnstr.substr($sourcestr,$i,3); //根据UTF-8编码规范,将3个连续的字符计为单个字符          
         $i=$i+3;            //实际Byte计为3  
         $n++;            //字串长度计1  
      }  
      elseif ($ascnum>=192) //如果ASCII位高与192,  
      {  
         $returnstr=$returnstr.substr($sourcestr,$i,2); //根据UTF-8编码规范,将2个连续的字符计为单个字符  
         $i=$i+2;            //实际Byte计为2  
         $n++;            //字串长度计1  
      }  
      elseif ($ascnum>=65 && $ascnum<=90) //如果是大写字母,  
      {  
         $returnstr=$returnstr.substr($sourcestr,$i,1);  
         $i=$i+1;            //实际的Byte数仍计1个  
         $n++;            //但考虑整体美观,大写字母计成一个高位字符  
      }  
      else                //其他情况下,包括小写字母和半角标点符号,  
      {  
         $returnstr=$returnstr.substr($sourcestr,$i,1);  
         $i=$i+1;            //实际的Byte数计1个  
         $n=$n+0.5;        //小写字母和半角标点等与半个高位字符宽...  
      }  
   }  
         if ($str_length>$i){  
          $returnstr = $returnstr . "...";//超过长度时在尾处加上省略号  
      }  
    return $returnstr;  
}

The above code implements cutting the UTF-8 encoded string according to the number of characters. If you want to count the letters as one word when cutting them, you need to change $n=$n+0.5; to $n=$ n+1;

In addition, it should be noted that PHP provides its own processing function, which can be achieved by using mb_substr($str,int,int,'utf-8'); The parameters are the target string $str, the starting interception position int, the interception length int, and the character encoding (utf-8) after interception.

Return results: int characters from the starting position (the characters at the starting position are included in the length).

For example:

$str = '性的规定可广泛的覆盖大沙发扩大双方';
echo mb_substr($str,7,3,'utf-8');

Output result: The coverage of three words



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn