Home  >  Article  >  Backend Development  >  PHP intercepts mixed Chinese and English strings_PHP tutorial

PHP intercepts mixed Chinese and English strings_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:33:491102browse

Today I encountered a problem of intercepting Chinese and English strings. In gbk, each Chinese character occupies two bytes. If they are all Chinese, you can use the substr() function to achieve it, but in Chinese Yingdu is in trouble. I found a good function in the previously collected code, which implements the interception function very well:

function get_word($string, $length, $dot = '..',$charset='gbk') {
    
    if(strlen($string) <= $length) {
        return $string;
    }
    $string = str_replace(array(' ',' ', '&', '"', '<', '>'), array('','','&', '"', '<', '>'), $string);
    $strcut = '';
    if(strtolower($charset) == 'utf-8') {
        $n = $tn = $noc = 0;
        while($n < strlen($string)) {
            $t = ord($string[$n]);
            if($t == 9 || $t == 10 || (32 <= $t && $t <= 126)) {
                $tn = 1; $n++; $noc++;
            } elseif(194 <= $t && $t <= 223) {
                $tn = 2; $n += 2; $noc += 2;
            } elseif(224 <= $t && $t < 239) {
                $tn = 3; $n += 3; $noc += 2;
            } elseif(240 <= $t && $t <= 247) {
                $tn = 4; $n += 4; $noc += 2;
            } elseif(248 <= $t && $t <= 251) {
                $tn = 5; $n += 5; $noc += 2;
            } elseif($t == 252 || $t == 253) {
                $tn = 6; $n += 6; $noc += 2;
            } else {
                $n++;
            }
            if($noc >= $length) {
                break;
            }
        }
        if($noc > $length) {
            $n -= $tn;
        }
        $strcut = substr($string, 0, $n);
    } else {
        for($i = 0; $i < $length; $i++) {
            $strcut .= ord($string[$i]) > 127 ? $string[$i].$string[++$i] : $string[$i];
        }
    }
    
    return $strcut.$dot;
}
$str = "欢迎 visit 简明 bkjia";
$str_result = get_word($str, 12);
echo $str_result;

Test run results:

欢迎 visit..

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/752413.htmlTechArticleToday I encountered a problem of intercepting Chinese and English strings. In gbk, each Chinese character occupies two characters. section, if it is all in Chinese, you can use the substr() function, but it is available in both Chinese and English...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn