Home > Article > Backend Development > [PHP] A simple and wonderful way to get the length of Chinese string_PHP tutorial
When I was writing the form validation class of the framework tonight, I needed to determine whether the length of a certain string was within a specified range. Naturally, I thought of the strlen function in PHP.
$str = 'Hello world!';
echo strlen($str); // Output 12
However, among the functions that come with PHP, strlen and mb_strlen both calculate the length by calculating the number of bytes occupied by the string. Under different encoding conditions, the number of bytes occupied by Chinese is different. Under GBK/GB2312, Chinese characters occupy 2 bytes, while under UTF-8, Chinese characters occupy 3 bytes.
$str = 'Hello world! ';
echo strlen($str); // Output 12 under GBK or GB2312, output 18 under UTF-8
When we judge the length of a string, we often need to judge the number of characters, not the number of bytes occupied by the string. For example, this PHP code under UTF-8:
$name = 'Zhang Gengchang';
$len = strlen($name);
// Output FALSE, because three Chinese characters occupy 9 bytes under UTF-8
if($len >= 3 && $len <= 8){
echo 'TRUE';
}else{
echo 'FALSE';
}
So is there any convenient and practical way to get the length of a Chinese string? You can use regular rules to calculate the number of Chinese characters, divide by 2 under GBK/GB2312 encoding, divide by 3 under UTF-8 encoding, and finally add the length of the non-Chinese string, but this is too troublesome, WordPress There is a more beautiful piece of code in , refer to it as follows:
$str = 'Hello, world! ';
preg_match_all('/./us', $str, $match);
echo count($match[0]); // Output 9
The idea is to use regular expressions to split the string into single characters, and directly use count to calculate the number of matching characters, which is the result we want.