Home  >  Article  >  Backend Development  >  PHP intercepts string length (Chinese and English mixed string)_PHP tutorial

PHP intercepts string length (Chinese and English mixed string)_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 16:56:341244browse

The article introduces the string interception function from the interception function that comes with PHP to finally supporting Chinese, English and mixed Chinese and English string interception methods. Friends in need can refer to it.

Get part of the string.

Syntax: string substr(string string, int start, int [length]);

Return value: String

Function type: Data processing

Content Description

This function extracts length characters from the start position of the string string. If start is a negative number, it starts from the end of the string. If the omitted parameter length exists but is a negative number, it means that the length character from the bottom is obtained.

Usage Example

The code is as follows Copy code
 代码如下 复制代码

echo substr("abcdef", 1, 3); // 返回 "bcd"
echo substr("abcdef", -2); // 返回 "ef"
echo substr("abcdef", -3, 1); // 返回 "d"
echo substr("abcdef", 1, -1); // 返回 "bcde"
?>

echo substr("abcdef", 1, 3); // Return "bcd"

echo substr("abcdef", -2); // Return "ef"

echo substr("abcdef", -3, 1); // return "d"
echo substr("abcdef", 1, -1); // return "bcde"

?>
 代码如下 复制代码

< ?php
//截取中文字符串
function mysubstr($str, $start, $len) {
$tmpstr = "";
$strlen = $start + $len;
for($i = 0; $i < $strlen; $i++) {
if(ord(substr($str, $i, 1)) > 0xa0) {
             $tmpstr .= substr($str, $i, 2);
             $i++;
         } else
             $tmpstr .= substr($str, $i, 1);
     }
     return $tmpstr;
 }
 ?>

The above only supports English and not Chinese

 代码如下 复制代码

< ?php
//截取utf8字符串
function utf8Substr($str, $from, $len)
{
return preg_replace('#^(?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$from.'}'.
'((?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$len.'}).*#s',
'',$str);
}
?>

Intercept GB2312 Chinese string
The code is as follows Copy code
< ?php <🎜> //Intercept Chinese string <🎜> function mysubstr($str, $start, $len) { <🎜> $tmpstr = ""; <🎜> $strlen = $start + $len; <🎜> for($i = 0; $i < $strlen; $i++) { <🎜> If(ord(substr($str, $i, 1)) > 0xa0) {                    $tmpstr .= substr($str, $i, 2);                 $i++;            } else                    $tmpstr .= substr($str, $i, 1); }   Return $tmpstr; } ?>
Intercept utf8 encoded multi-byte string
The code is as follows Copy code
< ?php <🎜> //Intercept utf8 string <🎜> function utf8Substr($str, $from, $len) <🎜> { <🎜> Return preg_replace('#^(?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$from.'}'. <🎜> ‘((?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$len.'}).*#s', <🎜> ‘$1’,$str); <🎜> } <🎜> ?>

/*
* Function: The function is the same as substr, except that it will not cause garbled characters
* Parameter:
* Return:
*/

The code is as follows Copy code

function utf8_substr( $str , $start , $length=null ){
         
                   // Intercept normally first.
           $res = substr( $str, $start, $length);
           $strlen = strlen( $str);
         
/* Then determine whether the first and last 6 bytes are complete (not incomplete) */

                            // If the parameter start is a positive number
             if ( $start >= 0 ){
                         // intercept about 6 bytes forward
                 $next_start = $start + $length; // Initial position
                 $next_len = $next_start + 6 <= $strlen ? 6 : $strlen - $next_start;
                  $next_segm = substr( $str , $next_start , $next_len );

// If the first byte is not the first byte of the complete character, then intercept about 6 bytes
$prev_start = $start - 6 > 0 ? $start - 6 : 0;
                 $prev_segm = substr( $str , $prev_start , $start - $prev_start );
}
​​​​ // start is a negative number
        else{
                         // intercept about 6 bytes forward
                 $next_start = $strlen + $start + $length; // Initial position
                 $next_len = $next_start + 6 <= $strlen ? 6 : $strlen - $next_start;
                  $next_segm = substr( $str , $next_start , $next_len );
                                                                      // If the first byte is not the first byte of the complete character, then intercept about 6 bytes.
                $start = $strlen + $start;
$prev_start = $start - 6 > 0 ? $start - 6 : 0;
                 $prev_segm = substr( $str , $prev_start , $start - $prev_start );
}

// Determine whether the first 6 bytes comply with utf8 rules

If ( preg_match( '@^([x80-xBF]{0,5})[xC0-xFD]?@' , $next_segm , $bytes ) ){
If ( !empty( $bytes[1] ) ){
$bytes = $bytes[1];
                      $res .= $bytes;
            }
}

// Determine whether the last 6 bytes comply with utf8 rules
         $ord0 = ord( $res[0] );
If ( 128 <= $ord0 && 191 >= $ord0 ){
// Take it back and add it in front of the res.
If ( preg_match( '@[xC0-xFD][x80-xBF]{0,5}$@' , $prev_segm , $bytes ) ){
If ( !empty( $bytes[0] ) ) {
$bytes = $bytes[0];
                           $res = $bytes . $res;
                }
            }
}

return $res;
}

Test data::

The code is as follows
 代码如下 复制代码
$str = 'dfjdjf测13f试65&2数据fdj(1就mfe&……就';
var_dump( utf8_substr( $str , 22 , 12 ) ); echo '
';
    var_dump( utf8_substr( $str , 22 , -6 ) ); echo '
';
    var_dump( utf8_substr( $str , 9 , 12 ) ); echo '
';
    var_dump( utf8_substr( $str , 19 , 12 ) ); echo '
';
    var_dump( utf8_substr( $str , 28 , -6 ) ); echo '
';
Copy code
'; var_dump( utf8_substr( $str , 22 , -6 ) ); echo '
'; var_dump( utf8_substr( $str , 9 , 12 ) ); echo '
'; var_dump( utf8_substr( $str , 19 , 12 ) ); echo '
'; var_dump( utf8_substr( $str , 28 , -6 ) ); echo '
';

显示结果::(截取无乱码, 欢迎大家测试, 提交bug)
string(12) "据fdj"
string(26) "据fdj(1就mfe&…"
string(13) "13f试65&2数"
string(12) "数据fd"
string(20) "dj(1就mfe&…"

把我常用的分享出来

下面我们再来看中文截函数吧。

 代码如下 复制代码

function MooCutstr($string, $length, $dot = ' ...') {
 global $charset;

 if(strlen($string) <= $length) {
return $string;
}
$string = str_replace(array('&', '"', '<', '>'), array('&', '"', '<', '>'), $string);
 $strcut = '';
 if(strtolower($charset) == 'utf-8') {
  $n = $tn = $noc = 0;
  while($n < strlen($string)) {
$t = ord($string[$n]);
if($t == 9 || $t == 10 || (32 <= $t && $t <= 126)) {
$tn = 1; $n++; $noc++;
} elseif (194 <= $t && $t <= 223) {
$tn = 2; $n += 2; $noc += 2;
} elseif (224 <= $t && $t < 239) {
$tn = 3; $n += 3; $noc += 2;
} elseif (240 <= $t && $t <= 247) {
$tn = 4; $n += 4; $noc += 2;
} elseif (248 <= $t && $t <= 251) {
$tn = 5; $n += 5; $noc += 2;
} elseif ($t == 252 || $t == 253) {
$tn = 6; $n += 6; $noc += 2;
} else {
$n++;
}
if($noc >= $length) {
    break;
   }
  }
  if($noc > $length) {
   $n -= $tn;
  }
  $strcut = substr($string, 0, $n);
 } else {
  for($i = 0; $i < $length; $i++) {
$strcut .= ord($string[$i]) > 127 ? $string[$i].$string[++$i] : $string[$i];
  }
 }
 //$strcut = str_replace(array('&', '"', '<', '>'), array('&', '"', '<', '>'), $strcut);

 return $strcut.$dot;
}

www.bkjia.comtruehttp://www.bkjia.com/PHPjc/631589.htmlTechArticle文章介绍了字符串截取函数从php自带的截取函数到最后支持中文,英文和中英文混合字符串截取方法介绍,有需要的朋友可参考一下。 取部...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn