Home  >  Article  >  Backend Development  >  A function that truly intercepts strings according to the rules of utf8 encoding (utf8 version sub_str)_PHP tutorial

A function that truly intercepts strings according to the rules of utf8 encoding (utf8 version sub_str)_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:14:561043browse

Copy code The code is as follows:

/*
* Function: The function is the same as substr, except that it will not cause garbled characters
* Parameters:
* Return:
*/
function utf8_substr( $str , $start , $length=null ){
// Intercept normally first.
$res = substr ( $str , $start , $length );
$strlen = strlen( $str );
/* Then determine whether the first and last 6 bytes are complete (not incomplete) */
// If The parameter start is a positive number
if ( $start >= 0 ){
//Truncate about 6 bytes forward
$next_start = $start + $length; // Initial position
$ next_len = $next_start + 6 <= $strlen ? 6 : $strlen - $next_start;
$next_segm = substr( $str , $next_start , $next_len );
// If the first byte is not The first byte of the complete character, and then intercept about 6 bytes
$prev_start = $start - 6 > 0 ? $start - 6 : 0;
$prev_segm = substr( $str , $prev_start , $start - $prev_start );
}
// start is a negative number
else{
// intercept about 6 bytes forward
$next_start = $strlen + $start + $ length; // Initial position
$next_len = $next_start + 6 <= $strlen ? 6 : $strlen - $next_start;
$next_segm = substr( $str , $next_start , $next_len );
// If the first byte is not the first byte of the complete character, then intercept about 6 bytes.
$start = $strlen + $start;
$prev_start = $start - 6 > 0 ? $start - 6 : 0;
$prev_segm = substr( $str , $prev_start , $start - $prev_start );
}
// Determine whether the first 6 bytes comply with utf8 rules
if ( preg_match( '@^([x80-xBF]{0,5})[xC0-xFD]?@' , $next_segm , $bytes ) ){
if ( !empty( $bytes[1] ) ){
$bytes = $bytes[1];
$res .= $bytes;
}
}
// Determine whether the last 6 bytes comply with utf8 rules
$ord0 = ord( $res[0] );
if ( 128 <= $ord0 && 191 >= $ord0 ){
// Intercept from the back and add it in front of res.
if ( preg_match( '@[xC0-xFD][x80-xBF]{0,5}$@' , $prev_segm , $bytes ) ){
if ( !empty( $bytes[0] ) ){
$bytes = $bytes[0];
$res = $bytes . $res;
}
}
}
return $res;
}

Test data::
Copy code The code is as follows:

$ str = 'dfjdjf test 13f test 65&2 data fdj (1 on mfe&...on';
var_dump( utf8_substr( $str , 22 , 12 ) ); echo '
';
var_dump( utf8_substr( $str , 22 , -6 ) ); echo '
';
var_dump( utf8_substr( $str , 9 , 12 ) ); echo '
';
var_dump( utf8_substr( $str , 19 , 12 ) ); echo '
';
var_dump( utf8_substr( $str , 28 , -6 ) ); echo '
' ;

Display results:: (No garbled interception, everyone is welcome to test and submit bugs)
string(12) "According tofスdj"
string(26) "According tofスdj(1 mfe&…"
string(13) "13f try 65&2 number"
string(12) "Datafd"
string(20) "dj(1 is mfe&…"

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/326195.htmlTechArticleCopy code The code is as follows: /* * Function: The function is the same as substr, except that it will not cause garbled characters* Parameters: * Return: */ function utf8_substr( $str , $start , $length=null ){ // First cut normally...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn