Home >Backend Development >PHP Tutorial >Example of getting the character length of a utf8 string in php_PHP tutorial

Example of getting the character length of a utf8 string in php_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:40:05948browse

When I was writing the form validation class of the framework tonight, I needed to determine whether the length of a certain string was within a specified range. Naturally, I thought of the strlen function in PHP.

The code is as follows


$str = 'Hello world!中';
echo strlen($str); // Output 12

 代码如下  


$str = 'Hello world!中';
echo strlen($str); // 输出12

Test Chinese

The code is as follows

$str = 'Hello, world! ';
echo strlen($str); // Output 12 under GBK or GB2312, output 18 under UTF-8

 代码如下  

$str = '你好,世界!';
echo strlen($str); // GBK或GB2312下输出12,UTF-8下输出18 

PHP’s built-in string length function strlen cannot correctly handle Chinese strings. All it gets is the number of bytes occupied by the string. For the Chinese encoding of GB2312, the value obtained by strlen is twice the number of Chinese characters, while for UTF-8 encoded Chinese, the difference is three times (under UTF-8 encoding, one Chinese character occupies 3 bytes).

The following example is taken from the famous WordPress. It is very accurate. It should also be noted that this function only applies to strings encoded in utf-8.

The code is as follows


function utf8_strlen($string=null){
// Decompose the string into units
Preg_match_all("/./us", $string, $match);
// Return the number of units
Return count($match[0]);
}

 代码如下  


function utf8_strlen($string=null){
    // 将字符串分解为单元
    preg_match_all("/./us", $string, $match);
    // 返回单元个数   
    return count($match[0]);
}

But the above code cannot handle GBK/GB2312 Chinese strings under UTF-8 encoding, because the Chinese characters of GBK/GB2312 will be recognized as two characters and the calculated number of Chinese characters will double, so I I came up with this idea:

The code is as follows

$tmp = @iconv('gbk', 'utf-8', $str);
If(!empty($tmp)){
$str = $tmp;
}
Preg_match_all('/./us', $str, $match);
echo count($match[0]);

 代码如下  

    $tmp = @iconv('gbk', 'utf-8', $str);
    if(!empty($tmp)){
    $str = $tmp;
    }
    preg_match_all('/./us', $str, $match);
    echo count($match[0]);

Compatible with GBK/GB2312 and UTF-8 encoding, passed the test with a small amount of data, but it is not yet confirmed whether it is completely correct

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/727579.htmlTechArticleWhen writing the form validation class of the framework tonight, I need to determine whether the length of a certain string is within the specified range. Naturally, the strlen function in PHP comes to mind. The code is as follows $str = 'Hello wo...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn