Home  >  Article  >  Backend Development  >  The idea and specific implementation of PHP regular judgment of Chinese UTF-8 or GBK_PHP tutorial

The idea and specific implementation of PHP regular judgment of Chinese UTF-8 or GBK_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 17:17:49910browse

UTF-8 matching: In JavaScript, it is very simple to determine whether a string is Chinese. For example:

Copy code The code is as follows:

var str = "php programming";
if (/^[ u4e00-u9fa5]+$/.test(str)) {
alert("This string is all in Chinese");
}else{
alert("This string is not all in Chinese") ;
}
//In php, x is used to represent hexadecimal data. So, transform it into the following code:
$str = "php programming";
if (preg_match("/^[x4e00-x9fa5]+$/",$str)) {
print(" The string is all in Chinese");
} else {
print("The string is not all in Chinese");
}

It seems that no error is reported, judge The result is also correct, but if $str is replaced with the word "programming", the result still shows "not all of the string is Chinese". It seems that this judgment is not accurate enough. Important: After consulting "Proficient in Regular Expressions", I found that for [x4e00-x9fa5], I made an enhanced explanation myself. In PHP's regular expressions, [x4e00-x9fa5] is actually the concept of characters and character groups. x{hex}, expresses a hexadecimal number. It should be noted that hex can be 1-2 digits or 4 digits, but if it is 4 digits, curly brackets must be added. At the same time, if it is greater than x The hex of {FF} must be used together with the u modifier, otherwise an illegal error will occur
On the Internet, you can only find regular rules for matching full-width characters: ^[x80-xff]*^/, you don’t need to add braces here
[ u4e00-u9fa5] can match Chinese, but PHP does not support
However, since the hexadecimal data represented by x, why is it different from the range x4e00-x9fa5 provided in js? So I changed to the code below and found that it was really accurate:
Copy the code The code is as follows:

$str = "php programming";
if (preg_match("/^[x{4e00}-x{9fa5}]+$/u",$str)) {
print("The string is all in Chinese ");
} else {
print("This string is not all in Chinese");
}

I know the regular expression for utf-8 encoding in php The final correct expression for matching Chinese characters - /^[x{4e00}-x{9fa5}]+$/u,

wrote the following test code with reference to the above article (copy the following code and save it as. php file)
Copy code The code is as follows:

$action = trim($_GET[ 'action']);
if($action == "sub")
{
$str = $_POST['dir'];
//if(!preg_match("/^ [".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str)) //GB2312 Chinese character alphanumeric underline regular expression
if( !preg_match("/^[x{4e00}-x{9fa5}A-Za-z0-9_]+$/u",$str)) //UTF-8 Chinese character alphanumeric underline regular expression
{
echo "The [".$str."] you entered contains illegal characters";
}
else
{
echo " The [".$str."] you entered is completely legal and passed!";
}
}
?>

Copy code The code is as follows:

Input characters (numbers, letters, Chinese characters, underscores):




GBK: preg_match("/^[".chr(0xa1)."-".chr(0xff)."A- Za-z0-9_]+$/",$str); //GB2312 Chinese character alphanumeric underline regular expression.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/621720.htmlTechArticleUTF-8 matching: In javascript, it is very simple to determine whether a string is Chinese. For example: Copy the code as follows: var str = "php programming"; if (/^[u4e00-u9fa5]+$/.test(str)) { alert(...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn