Home > Article > Backend Development > Smarty solution to the problem of intercepting garbled characters in Chinese and English multi-coded characters
This article mainly introduces the solution to the problem of intercepting garbled characters in smarty's Chinese and English multi-encoding characters. It involves the modification of the original smartTruncate. It is of great practical value. Friends who need it can refer to it.
This article tells the example of smarty's Chinese and English multi-encoding. The solution to the problem of character interception and garbled characters is shared with everyone for your reference. The specific method is as follows:
The display of general website pages will inevitably involve the interception of substrings. At this time, truncate comes in handy, but it is only suitable for English users. For Chinese users, using truncate will cause Garbled characters, and for mixed Chinese and English strings, if the same number of strings are intercepted, the actual display lengths will be different, which will appear uneven visually and affect the appearance. This is because the length of one Chinese character is roughly equivalent to the length of two English characters. In addition, truncate is not compatible with GB2312, UTF-8 and other encodings at the same time.
Improved smartTruncate: File name: modifier.smartTruncate.php
The specific code is as follows:
The code is as follows:
<?php function smartDetectUTF8($string) { static $result = array(); if(! array_key_exists($key = md5($string), $result)) { $utf8 = " /^(?: [\x09\x0A\x0D\x20-\x7E] # ASCII | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )+$/xs "; $result[$key] = preg_match(trim($utf8), $string); } return $result[$key]; } function smartStrlen($string) { $result = 0; $number = smartDetectUTF8($string) ? 3 : 2; for($i = 0; $i < strlen($string); $i += $bytes) { $bytes = ord(substr($string, $i, 1)) > 127 ? $number : 1; $result += $bytes > 1 ? 1.0 : 0.5; } return $result; } function smartSubstr($string, $start, $length = null) { $result = ''''; $number = smartDetectUTF8($string) ? 3 : 2; if($start < 0) { $start = max(smartStrlen($string) + $start, 0); } for($i = 0; $i < strlen($string); $i += $bytes) { if($start <= 0) { break; } $bytes = ord(substr($string, $i, 1)) > 127 ? $number : 1; $start -= $bytes > 1 ? 1.0 : 0.5; } if(is_null($length)) { $result = substr($string, $i); } else { for($j = $i; $j < strlen($string); $j += $bytes) { if($length <= 0) { break; } if(($bytes = ord(substr($string, $j, 1)) > 127 ? $number : 1) > 1) { if($length < 1.0) { break; } $result .= substr($string, $j, $bytes); $length -= 1.0; } else { $result .= substr($string, $j, 1); $length -= 0.5; } } } return $result; } function smarty_modifier_smartTruncate($string, $length = 80, $etc = ''...'', $break_words = false, $middle = false) { if ($length == 0) return ''''; if (smartStrlen($string) > $length) { $length -= smartStrlen($etc); if (!$break_words && !$middle) { $string = preg_replace(''/\s+?(\S+)?$/'', '''', smartSubstr($string, 0, $length+1)); } if(!$middle) { return smartSubstr($string, 0, $length).$etc; } else { return smartSubstr($string, 0, $length/2) . $etc . smartSubstr($string, -$length/2); } } else { return $string; } } ?>
The above code fully realizes the original function of truncate, and is compatible with both GB2312 and UTF-8 encoding. When judging the character length, a Chinese character counts as 1.0 and an English character counts as 0.5, so there will be no unevenness when intercepting substrings.
There is nothing special about how to use the plug-in. Here is a simple test :
The code is as follows:
{$content|smartTruncate:5:".."}($content等于"A中B华C人D民E共F和G国H")
Display: A Chinese B Chinese C... (The length of Chinese symbols is counted as 1.0, the length of English symbols is counted as 0.5, and the length of omitted symbols is considered)
No matter whether you use GB2312 encoding or UTF- 8 encoding, you will find that the results are correct, which is one of the reasons why I added the word smart in the plug-in name.