Home  >  Article  >  Backend Development  >  The difference between strlen, mb_strlen, substr(), mb_substr() and mb_strcut in php_PHP tutorial

The difference between strlen, mb_strlen, substr(), mb_substr() and mb_strcut in php_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 17:00:011178browse

The article introduces in detail the differences and usage of strlen, mb_strlen, substr(), mb_substr() and mb_strcut. Students who need to learn can refer to it.

About the use of the string splitting function of mb_*:
Configuration under win
Need to install php_mbstring.dll extension
You need to open php_mbstring.dll in php.ini
The configuration under Linux can be easily searched online

The code is as follows Copy code
 代码如下 复制代码

//测试时文件的编码方式要是UTF8
$str='中文a字1符';
echo strlen($str).'
';//14
    echo mb_strlen($str,'utf8').'
';//6
    echo mb_strlen($str,'gbk').'
';//8
    echo mb_strlen($str,'gb2312').'
';//10
?>

//The encoding method of the file during testing must be UTF8
$str='Chinese character a';

echo strlen($str).'
';//14

echo mb_strlen($str,'utf8').'
';//6

echo mb_strlen($str,'gbk').'
';//8

echo mb_strlen($str,'gb2312').'
';//10

?>

Result analysis: When calculating strlen, a UTF8 Chinese character is treated as 3 lengths, so the length of "Chinese a character 1 character" is 3*4+2=14. When calculating mb_strlen, the internal code is selected as UTF8, then A Chinese character will be calculated as a length of 1, so the length of "Chinese a character 1 character" is 6

mb_strlen default encoding can be passed

Obtained by mb_internal_encoding().

Using these two functions, you can jointly calculate the occupancy of a mixed Chinese and English string (the occupancy of a Chinese character is 2, and the occupancy of an English character is

1 echo (strlen($str) + mb_strlen($str,'UTF8')) / 2;


PHP’s built-in string length function strlen cannot correctly handle Chinese strings. It only gets the number of bytes occupied by the string. For GB2312 Chinese encoding, the value obtained by strlen is twice the number of Chinese characters, while for UTF-8 encoded Chinese, the difference is three times (under UTF-8 encoding, one Chinese character occupies 3 bytes).

String splitting

The substr() function can split text, but if the text to be split includes Chinese characters, you will often encounter problems. In this case, you can use the mb_substr()/mb_strcut function
 代码如下 复制代码
echo mb_substr('这样一来我的字符串就不会有乱码^_^', 0, 7, 'utf-8');
?>

mb_substr splits characters by words, while mb_strcut splits characters by bytes, but neither will produce half a character.

 代码如下 复制代码
echo mb_strcut('这样一来我的字符串就不会有乱码^_^', 0, 7, 'utf-8');
?>
The substr() function can split text, but if the text to be split includes Chinese characters, you will often encounter problems. In this case, you can use the mb_substr()/mb_strcut function. The usage of mb_substr()/mb_strcut is similar to substr(), except that mb_substr()/mb_strcut needs to add one more parameter at the end to set the encoding of the string. However, most servers do not open php_mbstring.dll. You need to open php_mbstring.dll in php.ini. For example:
The code is as follows Copy code
echo mb_substr('This way my string will not be garbled^_^', 0, 7, 'utf-8');<🎜> ?>
Output: This way my words
The code is as follows Copy code
echo mb_strcut('This way my string will not be garbled^_^', 0, 7, 'utf-8');<🎜> ?>


Output: like this
As can be seen from the above example, mb_substr splits characters by words, while mb_strcut splits characters by bytes, but neither of them will produce half a character...

Description of mbstring function:


PHP's mbstring extension module provides multi-byte character processing capabilities. The most commonly used method is to use mbstring to split multi-byte Chinese characters. This can avoid the occurrence of half characters. Since it is an extension of PHP, its The performance is also better than some custom multi-byte segmentation functions.

The mbstring extension provides several functions with similar functions, mb_substr and mb_strcut. See their explanation in the manual.

mb_substr
mb_substr() returns the portion of str specified by the start and length parameters.

mb_substr() performs multi-byte safe substr() operation based on number of characters. Position is sqlserver/42852.htm target=_blank >counted from the beginning of str. First character's position is 0. Second character position is 1 , and so on.

mb_strcut
mb_strcut() returns the portion of str specified by the start and length parameters.

mb_strcut() performs equivalent operation as mb_substr() with different method. If start position is multi-byte character's second byte or larger, it starts from first byte of multi-byte character.

It subtracts string from str that is shorter than length AND character that is not part of multi-byte string or not being middle of shift sequence.

For another example, there is a piece of text that is segmented using mb_substr and mb_strcut respectively:

PLAIN TEXT
CODE:

The code is as follows
 代码如下 复制代码

$str = '我是一串比较长的中文-www.webjx.com';

echo "mb_substr:" . mb_substr($str, 0, 6, 'utf-8');

echo "
";

echo "mb_strcut:" . mb_strcut($str, 0, 6, 'utf-8');
?>

Copy code

$str = 'I am a relatively long string of Chinese-www.webjx.com';

echo "mb_substr:" . mb_substr($str, 0, 6, 'utf-8');

代码如下 复制代码

/**
* 字符串分割 按字分割
* @param $content string
* @param $length int
* @param $etc string
* @return string
*/
function Truncate($content, $length, $etc = '...') {

if ($length == 0) {
return '';
} elseif (mb_strlen($content,'utf-8') > $length) {
            $length -= min($length, mb_strlen($etc));
            $charset = 'utf-8';
            $content = mb_substr($content, 0, $length, $charset) . $etc;
        }
        return $content;
    }

    $str ='伏尔泰(1694~1778)法国资产阶级启蒙思想家,哲学家,史学家,文学家。伏尔泰原名F.M.阿鲁埃。';

    echo strlen($str);//字符串长度
 echo '


';
    echo mb_strlen($str,'utf-8');//字符串长度
 echo '
';
 echo mb_strcut($str,0,35,'utf-8');//按字节分割
 echo '
';
 echo mb_substr($str,0,35,'utf-8');//按字 分割
 echo '
';
    echo Truncate($str,35);//字符串截取方法

echo "
"; echo "mb_strcut:" . mb_strcut($str, 0, 6, 'utf-8'); ?> The output results are as follows: mb_substr: I am a string of comparisons mb_strcut:I am Test code:
The code is as follows Copy code
/** * String splitting by word * @param $content string * @param $length int * @param $etc string * @return string ​*/ function Truncate($content, $length, $etc = '...') { if ($length == 0) {               return '';              } elseif (mb_strlen($content,'utf-8') > $length) {                $length -= min($length, mb_strlen($etc));                $charset = 'utf-8'; $content = mb_substr($content, 0, $length, $charset) . $etc; }          return $content; } $str ='Voltaire (1694~1778) French bourgeois Enlightenment thinker, philosopher, historian, and writer. Voltaire's original name was F.M. Arrouet. '; echo strlen($str);//String length echo '
'; Echo mb_strlen($str,'utf-8');//String length echo '
'; echo mb_strcut($str,0,35,'utf-8');//Split by bytes echo '
'; echo mb_substr($str,0,35,'utf-8');//Split by word echo '
'; echo Truncate($str,35);//String interception method

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/631281.htmlTechArticleThe article introduces in detail the differences and usage of strlen, mb_strlen, substr(), mb_substr() and mb_strcut. Students who need to learn can refer to it. Regarding the use of the string splitting function of mb_*:...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn