Home >Backend Development >PHP Tutorial >Solution to garbled Chinese characters intercepted by php function substr

Solution to garbled Chinese characters intercepted by php function substr

WBOY
WBOYOriginal
2016-07-25 08:58:111984browse
This article introduces the solution to the problem of garbled characters when intercepting Chinese characters using PHP's string interception function substr. Friends in need can refer to it.

php string interception function substr:

string substr ( string $string , int $start [, int $length ] ) Returns the string with length length starting from the start position in string

The substr function intercepts characters by bytes. Chinese characters are 2 bytes when encoded in GB2312, and 3 bytes when encoded by utf-8. Therefore, when intercepting a string of specified length, if the Chinese characters are truncated , then the returned results will be garbled when displayed.

Two solutions are provided below for your reference.

1, use mb_substr function instead string mb_substr ( string $str , int $start [, int $length [, string $encoding ]] ) Similar to the substr() function, but counting is based on the number of characters to ensure character safety Use the mb_substr() function to ensure that there will be no garbled characters. Disadvantages: Length statistics become character count statistics, not by byte count statistics. When used for display, there will be a large difference in display length between Chinese results and English results of the same length.

2. Self-built function enhances substr function Chinese characters are calculated in 2 length units, so that the final display length of the string interception result in a mixed Chinese and English environment is close; The last incomplete character is discarded to ensure that there will be no garbled characters on display; it is also compatible with UTF-8 encoding and GB2312 encoding commonly used for Chinese characters, and has good versatility.

The complete code is as follows (strtolower function is used):

<?php
/**
* 增强型字符串截取函数
* 截取中文字符无乱码
* edit bbs.it-home.org
*/
function getstr($string, $length, $encoding  = 'utf-8') {
    $string = trim($string);
 
    if($length && strlen($string) > $length) {
        //截断字符
        $wordscut = '';
        if(strtolower($encoding) == 'utf-8') {
            //utf8编码
            $n = 0;
            $tn = 0;
            $noc = 0;
            while ($n < strlen($string)) {
                $t = ord($string[$n]);
                if($t == 9 || $t == 10 || (32 <= $t && $t <= 126)) {
                    $tn = 1;
                    $n++;
                    $noc++;
                } elseif(194 <= $t && $t <= 223) {
                    $tn = 2;
                    $n += 2;
                    $noc += 2;
                } elseif(224 <= $t && $t < 239) {
                    $tn = 3;
                    $n += 3;
                    $noc += 2;
                } elseif(240 <= $t && $t <= 247) {
                    $tn = 4;
                    $n += 4;
                    $noc += 2;
                } elseif(248 <= $t && $t <= 251) {
                    $tn = 5;
                    $n += 5;
                    $noc += 2;
                } elseif($t == 252 || $t == 253) {
                    $tn = 6;
                    $n += 6;
                    $noc += 2;
                } else {
                    $n++;
                }
                if ($noc >= $length) {
                    break;
                }
            }
            if ($noc > $length) {
                $n -= $tn;
            }
            $wordscut = substr($string, 0, $n);
        } else {
            for($i = 0; $i < $length - 1; $i++) {
                if(ord($string[$i]) > 127) {
                    $wordscut .= $string[$i].$string[$i + 1];
                    $i++;
                } else {
                    $wordscut .= $string[$i];
                }
            }
        }
        $string = $wordscut;
    }
    return trim($string);
}
 
// 示例
echo getstr("0一二三四五六七",1).'<br />';  // 0
echo getstr("0一二三四五六七",2).'<br />';  // 0
echo getstr("0一二三四五六七",3).'<br />';  // 0一
echo getstr("0一二三四五六七",4).'<br />';  // 0一
echo getstr("0一二三四五六七",5).'<br />';  // 0一二
echo getstr("0一a二b三四五六七",1).'<br />';    // 0
echo getstr("0一a二b三四五六七",2).'<br />';    // 0
echo getstr("0一a二b三四五六七",3).'<br />';    // 0一
echo getstr("0一a二b三四五六七",4).'<br />';    // 0一a
echo getstr("0一a二b三四五六七",5).'<br />';    // 0一a
//此函数由UCHome 1.5中的getstr()函数修改而来。
?>


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn