Home > Article > Backend Development > How to intercept Chinese string in php_PHP tutorial
The easiest way to intercept a string in php is to use the substr() function. However, the substr function can only intercept English. If it is Chinese, it will not be garbled. Then some friends said that you can use mb_substr() to intercept. , this method cannot intercept mixed Chinese and English characters.
This function is used to intercept gb2312 encoded Chinese string:
The code is as follows | Copy code | ||||
function mysubstr($str, $start, $len) { $tmpstr = ""; $strlen = $start + $len; for($i = 0; $i < $strlen; $i++) { If(ord(substr($str, $i, 1)) > 0xa0) { $tmpstr .= substr($str, $i, 2); $i++; } else $tmpstr .= substr($str, $i, 1); } Return $tmpstr; } ?> |
Chinese character interception function supported by Utf-8 and gb2312
Interception utf-8 string function
In order to support multiple languages, the strings in the database may be saved as UTF-8 encoding. During website development, you may need to use PHP to intercept part of the string. In order to avoid garbled characters, write the following UTF-8 string interception function
For the principles of utf-8, please see UTF-8 FAQ
UTF-8 encoded characters may consist of 1~3 bytes, and the specific number can be determined from the first byte. (Theoretically it may be longer, but here we assume no more than 3 bytes)
If the first byte is greater than 224, it and the following 2 bytes form a UTF-8 character
If the first byte is greater than 192 and less than 224, it and the 1 byte after it form a UTF-8 character
Otherwise the first byte itself is an English character (including numbers and a small amount of punctuation).
The code is as follows | Copy code | ||||
|
Note:
The code is as follows
|
Copy code
|
||||
function utf8Substr($str, $from, $len)
Uft8 strings can be intercepted individually. Program description: 1. The len parameter is based on Chinese characters. 1len is equal to 2 English characters. In order to make the form more beautiful 2. If the magic parameter is set to false, Chinese and English will be treated equally, and the absolute number of characters will be taken
| 3. Especially suitable for strings encoded with htmlspecialchars()
The code is as follows | Copy code |
function FSubstr($title,$start,$len="",$magic=true) { /** * powered by Smartpig * mailto:d.einstein@263.net */ $length = 0; if($len == "") $len = strlen($title); //Judge the starting position to the incorrect position if($start > 0) { $cnum = 0; for($i=0;$i<$start;$i++) { if(ord(substr($title,$i,1)) >= 128) $cnum ++; } if($cnum%2 != 0) $start--; unset($cnum); } if(strlen($title)<=$len) return substr($title,$start,$len);<🎜> <🎜>$alen = 0; $blen = 0;<🎜> <🎜>$realnum = 0;<🎜> <🎜>for($i=$start;$i $ctype = 0; $cstep = 0; $cur = substr($title,$i,1); if($cur == "&") { if(substr($title,$i,4) == "<") { $cstep = 4; $length += 4; $i += 3; $realnum++; if($magic) { $alen++; } } else if(substr($title,$i,4) == ">") { $cstep = 4; $length += 4; $i += 3; $realnum++; if($magic) { $alen++; } } else if(substr($title,$i,5) == "&") { $cstep = 5; $length += 5; $i += 4; $realnum++; if($magic) { $alen++; } } else if(substr($title,$i,6) == """) { $cstep = 6; $length += 6; $i += 5; $realnum++; if($magic) { $alen++; } } else if(substr($title,$i,6) == "'") { $cstep = 6; $length += 6; $i += 5; $realnum++; if($magic) { $alen++; } } else if(preg_match("/(d+);/i",substr($title,$i,8),$match)) { $cstep = strlen($match[0]); $length += strlen($match[0]); $i += strlen($match[0])-1; $realnum++; if($magic) { $blen++; $ctype = 1; } } }else{ if(ord($cur)>=128) { $cstep = 2; $length += 2; $i += 1; $realnum++; if($magic) { $blen++; $ctype = 1; } }else{ $cstep = 1; $length +=1; $realnum++; if($magic) { $alen++; } } } if($magic) { if(($blen*2+$alen) == ($len*2)) break; if(($blen*2+$alen) == ($len*2+1)) { if($ctype == 1) { $length -= $cstep; break; }else{ break; } } }else{ if($realnum == $len) break; } } unset($cur); unset($alen); unset($blen); unset($realnum); unset($ctype); unset($cstep); return substr($title,$start,$length); } |