Home >Backend Development >PHP Tutorial >Detailed example of how PHP uses custom functions to count the length of Chinese strings

Detailed example of how PHP uses custom functions to count the length of Chinese strings

怪我咯
怪我咯Original
2017-07-04 12:03:101471browse

This article mainly introduces the method of phpcustom functionto achieve statistics on the length of Chinesestring, and summarizes and analyzes php's determination, encoding and operation related to Chinese in the form of examples. For operating skills, friends in need can refer to

. This article describes the method of counting the length of Chinese strings using PHP custom functions. Share it with everyone for your reference, the details are as follows:

Chinese characters are calculated as 2 characters and English characters are calculated as 1

Code

/**
* 可以统计中文字符串长度的函数
*
*/
function abslength($str)
{
  $len=strlen($str);
  $i=0;
  while($i<$len)
  {
    if(preg_match("/^[".chr(0xa1)."-".chr(0xff)."]+$/",$str[$i]))
    {
      $i+=2;
    }
    else
    {
      $i+=1;
    }
  }
  return $i;
}

Another: PHP determines the character length: Chinese, English, numbers.

There are many ways to do this. Record a simple one.

mb_strlen($str, &#39;GBK&#39;);

The disadvantage is that you need to install the mb library.

However, there are still some problems to be solved.

GB code encoding rules are as follows: each Chinese character consists of two bytes, the first byte ranges from 0XA1-0XFE, a total of 96 types. The range of the second byte is 0XA1-0XFE respectively, a total of 96 types. A total of 96 * 96 = 8836 Chinese characters can be defined using these two bytes. There are actually 6763 Chinese characters in total.

BIG5 code encoding rules are as follows: each Chinese character consists of two bytes, the first byte ranges from 0X81-0XFE, a total of 126 types. The range of the second byte is 0X40-0X7E, 0XA1-0XFE, a total of 157 types. In other words, a total of 126 * 157 = 19782 Chinese characters can be defined using these two bytes. Some of these Chinese characters are commonly used by us, such as Yi and D. These characters are called commonly used characters, and their BIG5 codes range from 0XA440 to 0XC671, a total of 5401 characters. Less commonly used characters, such as "tan" and "diao", are called less commonly used characters, ranging from 0XC940 to 0XF9FE, a total of 7652 characters. The rest are some special characters .

A safer way.

function StrLenW($str)
{
    $count = 0;
    $len = strlen($str);
     for($i=0; $i<$len; $i++,$count++)
       if(ord($str[$i])>=128)
        $i++;
     return $count;
}

Finally, the following is correct and universal!

Code:

/**作用:统计字符长度包括中文、英文、数字
* 参数:需要进行统计的字符串、编码格式目前系统统一使用UTF-8
* 修改记录:
   $str = "kds";
  echo sstrlen($str,&#39;utf-8&#39;);
* */
function sstrlen($str,$charset) {
    $n = 0; $p = 0; $c = &#39;&#39;;
    $len = strlen($str);
    if($charset == &#39;utf-8&#39;) {
      for($i = 0; $i < $len; $i++) {
        $c = ord($str{$i});
        if($c > 252) {
          $p = 5;
        } elseif($c > 248) {
          $p = 4;
        } elseif($c > 240) {
          $p = 3;
        } elseif($c > 224) {
          $p = 2;
        } elseif($c > 192) {
          $p = 1;
        } else {
          $p = 0;
        }
        $i+=$p;$n++;
      }
    } else {
      for($i = 0; $i < $len; $i++) {
        $c = ord($str{$i});
        if($c > 127) {
          $p = 1;
        } else {
          $p = 0;
      }
        $i+=$p;$n++;
      }
    }
    return $n;
}

The above is the detailed content of Detailed example of how PHP uses custom functions to count the length of Chinese strings. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn