Home  >  Article  >  Backend Development  >  How to get the length of mixed Chinese and English strings in PHP_PHP tutorial

How to get the length of mixed Chinese and English strings in PHP_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:28:52907browse

When I was writing the form validation class of the framework tonight, I needed to determine whether the length of a certain string was within a specified range. Naturally, I thought of the strlen function in PHP.

Copy code The code is as follows:

$str = 'Hello world!';
echo strlen($str) ; // Output 12

However, in the functions that come with PHP, strlen and mb_strlen both calculate the length by calculating the number of bytes occupied by the string. Under different encoding conditions, the length occupied by Chinese The number of bytes is different. Under GBK/GB2312, Chinese characters occupy 2 bytes, while under UTF-8, Chinese characters occupy 3 bytes.
Copy code The code is as follows:

$str = 'Hello, world! ';
echo strlen($str); // Output 12 under GBK or GB2312, and 18 under UTF-8

When we judge the length of a string, we often need to judge the number of characters. Rather than the number of bytes occupied by the string, such as this PHP code under UTF-8:
Copy the code The code is as follows:

$name = 'Zhang Gengchang';
$len = strlen($name);
// Output FALSE, because three Chinese characters occupy 9 bytes under UTF-8
if ($len >= 3 && $len <= 8){
echo 'TRUE';
}else{
echo 'FALSE';
}

Then there is What convenient and practical method can be used to obtain the length of a string containing Chinese characters? You can use regular rules to calculate the number of Chinese characters, divide by 2 under GBK/GB2312 encoding, and divide by 3 under UTF-8 encoding, and finally add the length of the non-Chinese string, but this is too troublesome.

WordPress such a piece of code, refer to the following:

Copy the code The code is as follows:

$str = 'Hello, world ! ';
preg_match_all('/./us', $str, $match);
echo count($match[0]); // Output 9

The idea is to use regular expressions Split the string into individual characters and directly use count to calculate the number of matching characters, which is the result we want.

But the above code cannot handle GBK/GB2312 Chinese strings under UTF-8 encoding, because the Chinese characters of GBK/GB2312 will be recognized as two characters and the calculated number of Chinese characters will double, so I I thought of such a way:

Copy the code The code is as follows:

$tmp = @iconv('gbk', 'utf -8', $str);
if(!empty($tmp)){
$str = $tmp;
}
preg_match_all('/./us', $str, $ match);
echo count($match[0]);

is compatible with GBK/GB2312 and UTF-8 encoding. It passed the test with a small amount of data, but it is not yet confirmed whether it is completely correct. Hope it will be The expert gives some advice.

The above intention is to make the framework compatible with multiple encoding formats, but generally in daily development, a project can already determine which encoding it is, so you can use the following function to easily obtain the string length:

Copy code The code is as follows:

int iconv_strlen ( string $str [, string $charset = ini_get("iconv.internal_encoding") ] )

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/780482.htmlTechArticleWhen writing the form validation class of the framework tonight, I need to determine whether the length of a certain string is within the specified range. Naturally, the strlen function in PHP comes to mind. Copy the code The code is as follows: $s...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn