Home  >  Article  >  Backend Development  >  PHP Chinese encoding judgment sample code

PHP Chinese encoding judgment sample code

怪我咯
怪我咯Original
2017-07-06 10:51:371229browse

php determines Chinese and encoding related gbk is double bytes, utf8 is three bytes, can be judged according to the range of Chinese

Encoding range 1. GBK (GB2312/GB18030)
\x00-\xff GBK double-byte encoding range
\x20-\x7f ASCII
\xa1-\xff Chinese
\x80-\xff Chinese
2. UTF -8 (Unicode)
\u4e00-\u9fa5 (Chinese)
\x3130-\x318F (Korean
\xAC00-\xD7A3 (Korean)
\u0800-\u4e00 (Japanese) )
ps: Korean is a character larger than [\u9fa5]
Regular example:
preg_replace(”/([\x80-\xff])/”,””,$ str);
preg_replace(”/([u4e00-u9fa5])/”,””,$str);
2. Code example

Code As follows:

//判断内容里有没有中文-GBK (PHP) 
function check_is_chinese($s){ 
return preg_match('/[\x80-\xff]./', $s); 
} 
//获取字符串长度-GBK (PHP) 
function gb_strlen($str){ 
$count = 0; 
for($i=0; $i<strlen($str); $i++){ 
$s = substr($str, $i, 1); 
if (preg_match("/[\x80-\xff]/", $s)) ++$i; 
++$count; 
} 
return $count; 
} 
//截取字符串字串-GBK (PHP) 
function gb_substr($str, $len){ 
$count = 0; 
for($i=0; $i<strlen($str); $i++){ 
if($count == $len) break; 
if(preg_match("/[\x80-\xff]/", substr($str, $i, 1))) ++$i; 
++$count; 
} 
return substr($str, 0, $i); 
} 
//统计字符串长度-UTF8 (PHP) 
function utf8_strlen($str) { 
$count = 0; 
for($i = 0; $i < strlen($str); $i++){ 
$value = ord($str[$i]); 
if($value > 127) { 
$count++; 
if($value >= 192 && $value <= 223) $i++; 
elseif($value >= 224 && $value <= 239) $i = $i + 2; 
elseif($value >= 240 && $value <= 247) $i = $i + 3; 
else die(&#39;Not a UTF-8 compatible string&#39;); 
} 
$count++; 
} 
return $count; 
} 
//截取字符串-UTF8(PHP) 
function utf8_substr($str,$position,$length){ 
$start_position = strlen($str); 
$start_byte = 0; 
$end_position = strlen($str); 
$count = 0; 
for($i = 0; $i < strlen($str); $i++){ 
if($count >= $position && $start_position > $i){ 
$start_position = $i; 
$start_byte = $count; 
} 
if(($count-$start_byte)>=$length) { 
$end_position = $i; 
break; 
} 
$value = ord($str[$i]); 
if($value > 127){ 
$count++; 
if($value >= 192 && $value <= 223) $i++; 
elseif($value >= 224 && $value <= 239) $i = $i + 2; 
elseif($value >= 240 && $value <= 247) $i = $i + 3; 
else die(&#39;Not a UTF-8 compatible string&#39;); 
} 
$count++; 
} 
return(substr($str,$start_position,$end_position-$start_position)); 
} 
//判断是否是有韩文-UTF-8 (JavaScript) 
function checkKoreaChar(str) { 
for(i=0; i<str.length; i++) { 
if(((str.charCodeAt(i) > 0x3130 && str.charCodeAt(i) < 0x318F) || (str.charCodeAt(i) >= 0xAC00 && str.charCodeAt(i) <= 0xD7A3))) { 
return true; 
} 
} 
return false; 
} 
//判断是否有中文字符-GBK (JavaScript) 
function check_chinese_char(s){ 
return (s.length != s.replace(/[^\x00-\xff]/g,"**").length); 
}

The above is the detailed content of PHP Chinese encoding judgment sample code. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn