Home  >  Article  >  Backend Development  >  Detailed explanation of how to use PHP custom function to count the length of Chinese strings

Detailed explanation of how to use PHP custom function to count the length of Chinese strings

墨辰丷
墨辰丷Original
2018-05-23 17:25:251798browse

This article mainly introduces the method of counting the length of Chinese strings using PHP custom functions. It summarizes and analyzes PHP's operating skills related to Chinese judgment, encoding and operation in the form of examples. Friends in need can refer to the following

Chinese characters are calculated as 2 characters. English characters are calculated as 1

Code

/**
* 可以统计中文字符串长度的函数
*
*/
function abslength($str)
{
  $len=strlen($str);
  $i=0;
  while($i<$len)
  {
    if(preg_match("/^[".chr(0xa1)."-".chr(0xff)."]+$/",$str[$i]))
    {
      $i+=2;
    }
    else
    {
      $i+=1;
    }
  }
  return $i;
}

Another: PHP determines the character length: Chinese, English, numbers.

There are many ways to do this. Record a simple one.

mb_strlen($str, &#39;GBK&#39;);

The disadvantage is that you need to install the mb library.

However, there are still some problems to be solved.

GB code encoding rules are as follows: each Chinese character consists of two bytes, the first byte ranges from 0XA1-0XFE, a total of 96 types. The range of the second byte is 0XA1-0XFE respectively, a total of 96 types. A total of 96 * 96 = 8836 Chinese characters can be defined using these two bytes. There are actually 6763 Chinese characters in total.

BIG5 code encoding rules are as follows: each Chinese character consists of two bytes, the first byte ranges from 0X81-0XFE, a total of 126 types. The range of the second byte is 0X40-0X7E, 0XA1-0XFE, a total of 157 types. In other words, a total of 126 * 157 = 19782 Chinese characters can be defined using these two bytes. Some of these Chinese characters are commonly used by us, such as Yi and D. These characters are called commonly used characters, and their BIG5 codes range from 0XA440 to 0XC671, a total of 5401 characters. Less commonly used characters, such as "tan" and "diao", are called less commonly used characters, ranging from 0XC940 to 0XF9FE, a total of 7652 characters, and the rest are some special characters.

A safer way.

function StrLenW($str)
{
    $count = 0;
    $len = strlen($str);
     for($i=0; $i<$len; $i++,$count++)
       if(ord($str[$i])>=128)
        $i++;
     return $count;
}

Finally, the following is correct and universal!

Code:

/**作用:统计字符长度包括中文、英文、数字
* 参数:需要进行统计的字符串、编码格式目前系统统一使用UTF-8
* 修改记录:
   $str = "kds";
  echo sstrlen($str,&#39;utf-8&#39;);
* */
function sstrlen($str,$charset) {
    $n = 0; $p = 0; $c = &#39;&#39;;
    $len = strlen($str);
    if($charset == &#39;utf-8&#39;) {
      for($i = 0; $i < $len; $i++) {
        $c = ord($str{$i});
        if($c > 252) {
          $p = 5;
        } elseif($c > 248) {
          $p = 4;
        } elseif($c > 240) {
          $p = 3;
        } elseif($c > 224) {
          $p = 2;
        } elseif($c > 192) {
          $p = 1;
        } else {
          $p = 0;
        }
        $i+=$p;$n++;
      }
    } else {
      for($i = 0; $i < $len; $i++) {
        $c = ord($str{$i});
        if($c > 127) {
          $p = 1;
        } else {
          $p = 0;
      }
        $i+=$p;$n++;
      }
    }
    return $n;
}

The above is the entire content of this article, I hope it will be helpful to everyone's study.


Related recommendations:

PHP Implementation of creating a file, writing data to the file, overwriting, and appending Code_php skills

PHPA simple way to determine whether a string contains another string_php skills

php ci code to obtain the values ​​of multiple input elements with the same name in the form_php skills

##

The above is the detailed content of Detailed explanation of how to use PHP custom function to count the length of Chinese strings. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn