Home  >  Article  >  Backend Development  >  PHP Chinese and English mixed string interception method

PHP Chinese and English mixed string interception method

WBOY
WBOYOriginal
2016-07-25 09:11:56951browse

Chinese and English mixed counting and interception, no custom functions are needed, but the mb extension of PHP is used, and the original PHP function is used to easily handle string interception.

First, let’s introduce common functions to intercept strings. mb_strwidth($str, $encoding) returns the width of the string $str The string to be calculated $encoding The encoding to use, such as utf8, gbk mb_strimwidth($str, $start, $width, $tail, $encoding) intercepts string according to width

$str The string to be intercepted $start From which position to intercept, the default is 0 $width The width to be intercepted $tail is appended to the string after the intercepted string. Commonly used ones are... $encoding The encoding to use

Example:

  1. /**
  2. * utf8 encoding format
  3. * 1 Chinese character occupies 3 bytes
  4. * What we hope is that 1 Chinese character occupies 2 bytes,
  5. * Because from the width point of view, the position occupied by 2 English letters is equivalent to 1 Chinese character
  6. */
  7. // Test string
  8. $str = 'aaaaahahaaaaahahahaaa';
  9. echo strlen($str); // Only strlen is used to output 25 bytes
  10. // The encoding must be specified, otherwise PHP's internal code mb_internal_encoding() will be used to view the internal code
  11. // Use mb_strwidth to output a string with a width of 20 and use utf8 encoding
  12. echo mb_strwidth($ str, 'utf8');
  13. // Only intercept if the width is greater than 10
  14. if(mb_strwidth($str, 'utf8')>10){
  15. // Set to intercept from 0 here, take 10 appends. .., use utf8 encoding
  16. // Note that the appended... will also be calculated into the length
  17. $str = mb_strimwidth($str, 0, 10, '...', 'utf8');
  18. }
  19. //The final output is aaaa... 4 a's are counted as 4 1's, 2 are counted as 3 points, and 3 are counted as 4+2+3=9
  20. // Isn't it very simple? Some people have said why. Isn’t 9 10?
  21. // Because "Ah" happens to be followed by "Ah", Chinese counts 2, 9+2=11 exceeds the setting, so removing 1 is 9
  22. echo $str;
Copy code

Other string interception functions: mb_strlen($str, $encoding) returns the length of the string $str The string to be calculated $encoding The encoding to use

mb_substr($str, $start, $length, $encoding) intercepts string $str The string to be intercepted $start Where to start intercepting $length intercepts the length $encoding encoding to use In fact, these two functions are very similar to strlen() and substr(). The only difference is that the encoding can be set.

The above two examples of string interception functions.

  1. /**
  2. * utf8 encoding format
  3. * 1 Chinese occupies 3 bytes
  4. */
  5. $str = 'aa12ahaa';
  6. echo strlen($str); // Direct output length is 9
  7. // Output length is 7. Why 7?
  8. // Note that after setting the encoding here, whether it is Chinese or English, the length of each is 1
  9. // a a 1 2 ah a a
  10. // 1+1+1+1+1+1+1 = 7
  11. // Is it exactly 7 characters?
  12. echo mb_strlen($str, 'utf8');
  13. // The same is true for mb_substr
  14. // I only want 5 characters now
  15. echo mb_substr($str, 0, 5, ' utf8'); // Output aa12
Copy code

There are many practical functions in the mb extension library, which are not introduced one by one here. If you are interested, you can refer to the relevant content in the PHP manual.



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn