Home >Backend Development >PHP Tutorial >Smarty solution to the problem of intercepting garbled characters in Chinese and English multi-coded characters
This article mainly introduces the solution to the problem of garbled interception of Chinese and English multi-coded characters in smarty. It involves the modification of the original smartTruncate, which is of great practical value. Friends in need can refer to it
This article explains the examples I have found a solution to the problem of intercepting garbled characters in Chinese and English multi-encoded characters using Smarty, and I would like to share it with you for your reference. The specific method is as follows:
The display of general website pages will inevitably involve the interception of substrings. At this time, truncate comes in handy, but it is only suitable for English users. For Chinese users, Using truncate will cause garbled characters, and for mixed Chinese and English strings, if the same number of strings are intercepted, the actual display lengths will be different, which will appear uneven visually and affect the appearance. This is because the length of one Chinese character is roughly equivalent to the length of two English characters. In addition, truncate is not compatible with GB2312, UTF-8 and other encodings at the same time.
Improved smartTruncate: File name: modifier.smartTruncate.php
The specific code is as follows:
<?php function smartDetectUTF8($string) { static $result = array(); if(! array_key_exists($key = md5($string), $result)) { $utf8 = " /^(?: [\x09\x0A\x0D\x20-\x7E] # ASCII | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )+$/xs "; $result[$key] = preg_match(trim($utf8), $string); } return $result[$key]; } function smartStrlen($string) { $result = 0; $number = smartDetectUTF8($string) ? 3 : 2; for($i = 0; $i < strlen($string); $i += $bytes) { $bytes = ord(substr($string, $i, 1)) > 127 ? $number : 1; $result += $bytes > 1 ? 1.0 : 0.5; } return $result; } function smartSubstr($string, $start, $length = null) { $result = ''''; $number = smartDetectUTF8($string) ? 3 : 2; if($start < 0) { $start = max(smartStrlen($string) + $start, 0); } for($i = 0; $i < strlen($string); $i += $bytes) { if($start <= 0) { break; } $bytes = ord(substr($string, $i, 1)) > 127 ? $number : 1; $start -= $bytes > 1 ? 1.0 : 0.5; } if(is_null($length)) { $result = substr($string, $i); } else { for($j = $i; $j < strlen($string); $j += $bytes) { if($length <= 0) { break; } if(($bytes = ord(substr($string, $j, 1)) > 127 ? $number : 1) > 1) { if($length < 1.0) { break; } $result .= substr($string, $j, $bytes); $length -= 1.0; } else { $result .= substr($string, $j, 1); $length -= 0.5; } } } return $result; } function smarty_modifier_smartTruncate($string, $length = 80, $etc = ''...'', $break_words = false, $middle = false) { if ($length == 0) return ''''; if (smartStrlen($string) > $length) { $length -= smartStrlen($etc); if (!$break_words && !$middle) { $string = preg_replace(''/\s+?(\S+)?$/'', '''', smartSubstr($string, 0, $length+1)); } if(!$middle) { return smartSubstr($string, 0, $length).$etc; } else { return smartSubstr($string, 0, $length/2) . $etc . smartSubstr($string, -$length/2); } } else { return $string; } } ?>
The above code fully realizes the original function of truncate, and can It is compatible with GB2312 and UTF-8 encoding at the same time. When judging the character length, a Chinese character is counted as 1.0 and an English character is counted as 0.5, so there will be no uneven situation when intercepting substrings.
Plug-in There is nothing special about the usage. Here is a simple test:
The code is as follows:
{$content|smartTruncate:5:".."}($content等于"A中B华C人D民E共F和G国H")
Display: A Chinese B Chinese C... (Chinese symbol length calculation 1.0, the English symbol length is counted as 0.5, and the length of the omitted symbol is considered)
Whether you use GB2312 encoding or UTF-8 encoding, you will find that the results are correct, which is why I added the word smart in the plug-in name. one.