Home >Backend Development >PHP Tutorial >Function to intercept strings according to utf8 encoding rules (utf8 version of sub_str)

Function to intercept strings according to utf8 encoding rules (utf8 version of sub_str)

WBOY
WBOYOriginal
2016-07-25 09:03:341120browse
  1. /*
  2. * Function: The function is the same as substr, except that it will not cause garbled characters
  3. * Parameters:
  4. * Return:
  5. */
  6. function utf8_substr( $str , $start , $length =null ){
  7. // First intercept normally.
  8. $res = substr( $str , $start , $length );
  9. $strlen = strlen( $str );
  10. /* Then determine whether the first and last 6 bytes are Complete (not incomplete) */
  11. // If the parameter start is a positive number
  12. if ( $start >= 0 ){
  13. // intercept about 6 bytes forward
  14. $next_start = $start + $length; // Initial Position
  15. $next_len = $next_start + 6 <= $strlen ? 6 : $strlen - $next_start;
  16. $next_segm = substr( $str , $next_start , $next_len );
  17. // If the first byte is not complete The first byte of the character, and then intercept about 6 bytes
  18. $prev_start = $start - 6 > 0 ? $start - 6 : 0;
  19. $prev_segm = substr( $str , $prev_start , $start - $prev_start );
  20. }
  21. // start is a negative number
  22. else{
  23. // intercept about 6 bytes forward
  24. $next_start = $strlen + $start + $length; // initial position
  25. $next_len = $next_start + 6 < ;= $strlen ? 6 : $strlen - $next_start;
  26. $next_segm = substr( $str , $next_start , $next_len );
  27. // If the first byte is not the first byte of the complete character, intercept it later About 6 bytes.
  28. $start = $strlen + $start;
  29. $prev_start = $start - 6 > 0 ? $start - 6 : 0;
  30. $prev_segm = substr( $str , $prev_start , $start - $ prev_start );
  31. }
  32. // Determine whether the first 6 bytes comply with utf8 rules
  33. if ( preg_match( '@^([x80-xBF]{0,5})[xC0-xFD]?@' , $next_segm , $ bytes ) ){
  34. if ( !empty( $bytes[1] ) ){
  35. $bytes = $bytes[1];
  36. $res .= $bytes;
  37. }
  38. }
  39. // Determine whether the last 6 bytes match utf8 rules
  40. $ord0 = ord( $res[0] );
  41. if ( 128 <= $ord0 && 191 >= $ord0 ){
  42. // Intercept from the back and add it in front of res.
  43. if ( preg_match( '@[xC0-xFD][x80-xBF]{0,5}$@' , $prev_segm , $bytes ) ){
  44. if ( !empty( $bytes[0] ) ){
  45. $bytes = $ bytes[0];
  46. $res = $bytes . $res;
  47. }
  48. }
  49. }
  50. return $res;
  51. }
  52. ?>
Copy code

Test ---

  1. $str = 'dfjdjf test 13f test 65&2 datafddj(1 on mfe&...on';
  2. var_dump( utf8_substr( $str , 22 , 12 ) ); echo '
    ';
  3. var_dump( utf8_substr( $str , 22 , -6 ) ); echo '
    ';
  4. var_dump( utf8_substr( $str , 9 , 12 ) ); echo '
    ';
  5. var_dump( utf8_substr( $str , 19 , 12 ) ); echo '
    ';
  6. var_dump( utf8_substr( $str , 28 , -6 ) ); echo '
    ?>
Copy code

Display results: (interception without garbled characters) string(12) "According to fdj" string(26) "According to fdj (1 is mfe&..." string(13) "13f try 65&2 number" string(12) "data fd" string(20) "dj(1justmfe&..."



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn