Home >Backend Development >PHP Tutorial >Solution to PHP Chinese string truncation without garbled characters

Solution to PHP Chinese string truncation without garbled characters

WBOY
WBOYOriginal
2016-07-25 09:07:24835browse
  1. function substring($str, $start, $length){ //It is better to use the string interception function
  2. $len = $length;
  3. if($length < 0){
  4. $str = strrev($str);
  5. $len = -$length;
  6. }
  7. $len= ($len < strlen($str)) ? $len : strlen($str);
  8. $tmpstr = "" ;
  9. for ($i= $start; $i < $len; $i ++)
  10. {
  11. if (ord(substr($str, $i, 1)) > 0xa0)
  12. {
  13. $tmpstr . = substr($str, $i, 2);
  14. $i++;
  15. } else {
  16. $tmpstr .= substr($str, $i, 1);
  17. }
  18. }
  19. if($length < 0) $ tmpstr = strrev($tmpstr);
  20. return $tmpstr;
  21. }
  22. ?>
Copy code

Usage example:

  1. $str1 = 'I am a relatively long string of Chinese without English';

  2. $str2 = 'I am a relatively long string of Chinese with yingwen';

  3. $len = strlen($str1);

  4. echo '
    '.$len; //return 28

  5. $len = strlen ($str2);

  6. echo '
    '.$len; //return 29

  7. echo '
    ';

  8. echo substring($str1, 0 , 11);
  9. echo '
    ';
  10. echo substring($str2, 0, 11);
  11. echo '
    ';
  12. echo substring($str1, 16, 28);
  13. echo '
    ';
  14. echo substring($str2, 16, 29);
  15. ?>
Copy the code

The result shows: 28 29 I am a string of comparisons I am a string of comparisons Chinese without English Chinese with yingwen

This function is very useful. For example, it can be used to truncate a relatively long file name, but if you want to add... in the middle, you can do it like this:

  1. function formatName($str, $size){
  2. $len = strlen($str);
  3. if(strlen($str) > $size) {
  4. $part1 = substring ($str, 0, $size / 2);
  5. $part2 = substring($str, $len - ($size/2), $len);
  6. return $part1 . "..." . $part2;
  7. } else {
  8. return $str;
  9. }
  10. }
  11. ?>
Copy code

In addition, I saw a super simple Chinese truncation solution on the Internet. After testing, the effect is very good:

  1. echo substr($str1,0,10).chr(0);
  2. ?>
Copy code

Principle explanation: chr(0) is not null null means nothing, and the value of chr(0) is 0. Expressed in hexadecimal it is 0x00, expressed in binary it is 00000000 Although chr(0) does not display anything, it is a character. When a Chinese character is truncated, according to the encoding rules, it always has to pull in other characters behind it and interpret them as Chinese characters. This is the reason why garbled characters appear. The combination of values ​​0x81 to 0xff and 0x00 is always displayed as "empty" According to this feature, adding a chr(0) after the result of substr can prevent garbled characters

20120705 update: Although the above method is good, you still encounter garbled characters occasionally, and the reason is not yet investigated. However, you can use the following method, which has been tried and tested for UTF8 character text. Note: In this method, Chinese characters are calculated as 1 unit length, and one English letter is 1 unit length, so you need to pay attention to the length setting when truncation. How to calculate length:

  1. function strlen_UTF8($str)
  2. {
  3. $len = strlen($str);
  4. $n = 0;
  5. for($i = 0; $i < $len; $i++) {
  6. $x = substr($str, $i, 1);
  7. $a = base_convert(ord($x), 10, 2);
  8. $a = substr('00000000' .$a, -8);
  9. if (substr($a, 0, 1) == 0) {
  10. }elseif (substr($a, 0, 3) == 110) {
  11. $i += 1;
  12. }elseif (substr($a, 0, 4) == 1110) {
  13. $i += 2;
  14. }
  15. $n++;
  16. }
  17. return $n;
  18. } // End strlen_UTF8;

  19. ///String truncation function:

  20. function subString_UTF8($str, $start, $lenth)
  21. {
  22. $len = strlen($str);
  23. $r = array();
  24. $n = 0;
  25. $m = 0;
  26. for($i = 0; $i < $len; $i++) {
  27. $x = substr($str, $i, 1);
  28. $a = base_convert(ord( $x), 10, 2);
  29. $a = substr('00000000'.$a, -8);
  30. if ($n < $start){
  31. if (substr($a, 0, 1) = = 0) {
  32. }elseif (substr($a, 0, 3) == 110) {
  33. $i += 1;
  34. }elseif (substr($a, 0, 4) == 1110) {
  35. $i += 2;
  36. }
  37. $n++;
  38. }else{
  39. if (substr($a, 0, 1) == 0) {
  40. $r[ ] = substr($str, $i, 1);
  41. } elseif (substr($a, 0, 3) == 110) {
  42. $r[ ] = substr($str, $i, 2);
  43. $i += 1;
  44. }elseif (substr($a, 0 , 4) == 1110) {
  45. $r[ ] = substr($str, $i, 3);
  46. $i += 2;
  47. }else{
  48. $r[ ] = '';
  49. }
  50. if ( ++$m >= $lenth){
  51. break;
  52. }
  53. }
  54. }
  55. return join($r);
  56. } // End subString_UTF8;

  57. //Usage method and The same as what was introduced before, for example, formatName can be implemented as follows (this has a small optimization for the length of Chinese characters):

  58. function formatName($str, $size){
  59. $len = strlen_UTF8($str);
  60. $one_len = strlen($str );
  61. $size = $size * 1.5 * $len / ($one_len);
  62. if(strlen_UTF8($str) > $size) {
  63. $part1 = subString_UTF8($str, 0, $size / 2);
  64. $part2 = subString_UTF8($str, $len - ($size/2), $len);
  65. return $part1 . "..." . $part2;
  66. } else {
  67. return $str;
  68. }
  69. }
  70. ?>

Copy code



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn