Home > Article > Backend Development > [Transfer] UTF-8 Chinese character regular expression, utf-8 Chinese character regular expression_PHP tutorial
Original link: http://blog.csdn.net/wide288/article/ details/30066639
$str = "Programming";
// if(!preg_match("/^[x{4e00}-x{9fa5}A-Za-z0-9_] $/u",$str)) //UTF-8 Chinese character alphanumeric underline regular expression
if(!preg_match("/^[x{4e00}-x{9fa5}] $/u",$str)) //UTF-8 Chinese character alphabet Number underline regular expression
{
echo "The [".$str."] you entered contains illegal characters";
}
else
{
echo "The [".$str."] you entered is completely legal and passed!";
}
-----------------------
UTF-8 matching:
In javascript, it is very simple to determine whether a string is Chinese. For example: var str = "php programming"; if (/^[u4e00-u9fa5] $/.test(str)) { alert("The string is all in Chinese"); } else{ alert("The string is not All in Chinese"); }
In php, x is used to represent hexadecimal data. Therefore, it is transformed into the following code: $str = "php programming"; if (preg_match("/^[x4e00-x9fa5] $/",$str)) { print("This string is all in Chinese"); } else { print("Not all of the string is in Chinese"); } It seems that no error is reported, and the judgment result is correct. However, if $str is replaced with the word "programming", the result still shows "Not all of the string is in Chinese." Chinese", it seems that this judgment is still not accurate enough.
Important: After checking "Proficient in Regular Expressions", I found that for [x4e00-x9fa5], I made a strengthened explanation myself
In the regular expression of php, [x4e00-x9fa5] is actually the character And the concept of character group, x{hex}, expresses a hexadecimal number. It should be noted that hex can be 1-2 digits or 4 digits, but if it is 4 digits, curly brackets must be added.
At the same time, if the hex is larger than x{FF}, it must be used together with the u modifier, otherwise an illegal error will occur
Only regular rules for matching full-width characters can be found on the Internet: ^[x80-xff]*^ / , you can match Chinese without adding braces [u4e00-u9fa5], but PHP does not support it. However, since the hexadecimal data represented by x, why is it different from the range x4e00-x9fa5 provided in js? So I changed to the code below and found that it was really accurate: $str = "php programming"; if (preg_match("/^[x{4e00}-x{9fa5}] $/u",$str) ) { print("This string is all Chinese"); } else { print("This string is not all Chinese"); }
I know how to use regular expressions to match Chinese characters under UTF-8 encoding in PHP The final correct expression - /^[x{4e00}-x{9fa5}] $/u, referring to the above article, I wrote the following test code (copy the following code and save it as a .php file)
GBK:
preg_match("/^[".chr(0xa1). "-".chr(0xff)."A-Za-z0-9_] $/",$str); //GB2312 Chinese character alphanumeric underline regular expression