Home  >  Article  >  Backend Development  >  PHP determines the string length strlen() and mb_strlen() functions

PHP determines the string length strlen() and mb_strlen() functions

巴扎黑
巴扎黑Original
2016-11-09 14:38:491512browse

strlen()

PHP strlen() function

Definition and usage

strlen() function returns the length of a string.

Syntax

strlen(string)

Parameters: string
Description: Required. Specifies the string to check.

The code is as follows

<?php 
$str=‘中文a字1符‘; 
echo strlen($str); 
echo ‘<br />‘; 
echo mb_strlen($str,‘UTF8‘); 
//输出结果 
//14 
//6 
?>

Result analysis: When calculating strlen, a UTF8 Chinese character is treated as 3 lengths, so the length of "Chinese a character 1 character" is 3*4+2=14
When calculating mb_strlen, select If the internal code is UTF8, a Chinese character will be calculated as a length of 1, so the length of "Chinese a character 1 character" is 6


mb_strlen() function

It should be noted that mb_strlen is not a PHP core function , before use, you need to make sure that php_mbstring.dll is loaded in php.ini, that is, make sure that the line "extension=php_mbstring.dll" exists and is not commented out, otherwise the problem of undefined functions will occur.

The code is as follows

<?php 
$str=‘中文a字1符‘; 
//计算如下 
echo (strlen($str) + mb_strlen($str,‘UTF8‘)) / 2; 
echo 
//输出结果 
//10 
?>

The strlen($str) value of "Chinese a character 1 character" is 14, and the mb_strlen($str) value is 6. Then it can be calculated that the placeholder of "Chinese a character 1 character" is 10.

Explain the difference between the two

The code is as follows

<?php
//测试时文件的编码方式要是UTF8
$str=&#39;中文a字1符&#39;;
echo strlen($str).&#39;<br>&#39;;//14
echo mb_strlen($str,&#39;utf8&#39;).&#39;<br>&#39;;//6
echo mb_strlen($str,&#39;gbk&#39;).&#39;<br>&#39;;//8
echo mb_strlen($str,&#39;gb2312&#39;).&#39;<br>&#39;;//10
?>


Result analysis: When calculating strlen, a UTF8 Chinese character is treated as 3 lengths, so "Chinese a character 1 character" The length is 3*4+2=14. When calculating mb_strlen

, if the internal code is selected as UTF8, a Chinese character will be calculated as a length of 1, so the length of "Chinese a character 1 character" is 6.

Although the above function can simply solve some problems of mixing Chinese and English, it cannot be used in actual practice. Let me introduce other better solutions to my friends


.

The implementation code for PHP to get the length of mixed Chinese and English strings is as follows, 1 Chinese = 1 digit, 2 English = 1 digit, you can modify it yourself

The code is as follows

/*** PHP获取字符串中英文混合长度 * @param $str string 字符串* @param $$charset string 编码* @return 返回长度,1中文=1位,2英文=1位*/function strLength($str,$charset=&#39;utf-8&#39;){if($charset==&#39;utf-8&#39;) $str = iconv(&#39;utf-8&#39;,&#39;gb2312&#39;,$str);$num = strlen($str);$cnNum = 0;for($i=0;$i<$num;$i++){if(ord(substr($str,$i+1,1))>127){$cnNum++;$i++;}}$enNum = $num-($cnNum*2);$number = ($enNum/2)+$cnNum;return ceil($number);}
//测试输出长度都为15$str1 = &#39;测试测试测试测试测试测试测试测&#39;;$str2 = &#39;aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa&#39;;$str3 = &#39;aa测试aa测试aa测试aa测试aaaaaa&#39;;echo strLength($str1,&#39;gb2312&#39;);echo strLength($str2,&#39;gb2312&#39;);echo strLength($str3,&#39;gb2312&#39;);


Intercept string function

UTF8 encoding, in In UTF8, one Chinese character occupies 3 bytes

The code is as follows

function msubstr($str, $start, $len) {
 $tmpstr = "";
 $strlen = $start + $len;
 for($i = 0; $i < $strlen; $i++){
  if(ord(substr($str, $i, 1)) > 127){
   $tmpstr.=substr($str, $i, 3);
   $i+=2;
  }else
   $tmpstr.= substr($str, $i, 1);
 }
 return $tmpstr;
}
echo msubstr("一二三天下致公english",0,10);


GB2312 encoding, in gb2312, one Chinese character occupies 2 bytes

The code is as follows

<?php
function msubstr($str, $start, $len) {   //ȡ
   $tmpstr = "";
   $strlen = $start + $len;
   if(preg_match(&#39;/[/d/s]{2,}/&#39;,$str)){$strlen=$strlen-2;}
   for($i = 0; $i < $strlen; $i++) {
       if(ord(substr($str, $i, 1)) > 0xa0) {
           $tmpstr .= substr($str, $i, 2);
           $i++;
       } else
           $tmpstr .= substr($str, $i, 1);
     }
   return $tmpstr;
 }
  
?>


Compatible The code of the good function

is as follows

function cc_msubstr($str, $start=0, $length, $charset="utf-8", $suffix=true)
{
 if(function_exists("mb_substr"))
  return mb_substr($str, $start, $length, $charset);
 elseif(function_exists(&#39;iconv_substr&#39;)) {
  return iconv_substr($str,$start,$length,$charset);
 }
 $re[&#39;utf-8&#39;]   = "/[/x01-/x7f]|[/xc2-/xdf][/x80-/xbf]|[/xe0-/xef][/x80-/xbf]{2}|[/xf0-/xff]
[/x80-/xbf]{3}/";
 $re[&#39;gb2312&#39;] = "/[/x01-/x7f]|[/xb0-/xf7][/xa0-/xfe]/";
 $re[&#39;gbk&#39;]   = "/[/x01-/x7f]|[/x81-/xfe][/x40-/xfe]/";
 $re[&#39;big5&#39;]   = "/[/x01-/x7f]|[/x81-/xfe]([/x40-/x7e]|/xa1-/xfe])/";
 preg_match_all($re[$charset], $str, $match);
 $slice = join("",array_slice($match[0], $start, $length));
 if($suffix) return $slice."…";
 return $slice;
}

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn