Home  >  Article  >  Backend Development  >  PHP Chinese and encoding judgment code_PHP tutorial

PHP Chinese and encoding judgment code_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:37:20808browse

Encoding range 1. GBK (GB2312/GB18030)
x00-xff GBK double-byte encoding range
x20-x7f ASCII
xa1-xff Chinese
x80-xff Chinese
2. UTF-8 (Unicode)
u4e00-u9fa5 (Chinese)
x3130-x318F (Korean)
xAC00-xD7A3 (Korean)
u0800-u4e00 (Japanese) )
ps: Korean is a character larger than [u9fa5]
Regular example:
preg_replace(”/([x80-xff])/”,””,$str);
preg_replace(”/([u4e00-u9fa5])/”,””,$str);
2. Code example

Copy Code The code is as follows:

//Judge whether there is Chinese in the content-GBK (PHP)
function check_is_chinese($s){
return preg_match('/ [x80-xff]./', $s);
}
//Get the string length-GBK (PHP)
function gb_strlen($str){
$count = 0;
for($i=0; $i$s = substr($str, $i, 1);
if (preg_match("/[x80 -xff]/", $s)) ++$i;
++$count;
}
return $count;
}
//Intercept string string-GBK (PHP)
function gb_substr($str, $len){
$count = 0;
for($i=0; $i if($count == $len) break;
if(preg_match("/[x80-xff]/", substr($str, $i, 1))) ++$i;
++ $count;
}
return substr($str, 0, $i);
}
//Statistical string length-UTF8 (PHP)
function utf8_strlen($str) {
$count = 0;
for($i = 0; $i < strlen($str); $i++){
$value = ord($str[$i]);
if($value > 127) {
$count++;
if($value >= 192 && $value <= 223) $i++;
elseif($value >= 224 && $value <= 239) $i = $i + 2;
elseif($value >= 240 && $value <= 247) $i = $i + 3;
else die('Not a UTF-8 compatible string');
}
$count++;
}
return $count;
}
//Intercept string-UTF8(PHP)
function utf8_substr($str,$position,$length){
$start_position = strlen($str);
$start_byte = 0;
$end_position = strlen($str);
$count = 0;
for($i = 0; $i < strlen($str); $i++){
if($count >= $position && $start_position > $i){
$start_position = $i;
$start_byte = $count;
}
if(($count-$start_byte)>=$length) {
$end_position = $i;
break;
}
$value = ord($str[$i]);
if($value > 127){
$count++;
if($value > = 192 && $value <= 223) $i++;
elseif($value >= 224 && $value <= 239) $i = $i + 2;
elseif($value >= 240 && $value <= 247) $i = $i + 3;
else die('Not a UTF-8 compatible string');
}
$count++;
}
return(substr($str,$start_position,$end_position-$start_position));
}
//Determine whether there is Korean-UTF-8 (JavaScript)
function checkKoreaChar(str) {
for(i=0; iif(((str.charCodeAt(i) > 0x3130 && str.charCodeAt(i) < 0x318F) || (str.charCodeAt (i) >= 0xAC00 && str.charCodeAt(i) <= 0xD7A3))) {
return true;
}
}
return false;
}
/ /Determine whether there are Chinese characters-GBK (JavaScript)
function check_chinese_char(s){
return (s.length != s.replace(/[^x00-xff]/g,"**"). length);
}

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/321937.htmlTechArticleEncoding range 1. GBK (GB2312/GB18030) x00-xff GBK double-byte encoding range x20-x7f ASCII xa1 -xff Chinese x80-xff Chinese 2. UTF-8 (Unicode) u4e00-u9fa5 (Chinese) x3130-x318F (Korean...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn