Home  >  Article  >  Backend Development  >  UTF-8 Chinese character regular expression

UTF-8 Chinese character regular expression

WBOY
WBOYOriginal
2016-08-08 09:19:13982browse

Original link: http://blog.csdn.net/wide288/article/details/30066639

$str = "Programming";
// if(!preg_match("/^[x{4e00}-x{9fa5 }A-Za-z0-9_]+$/u",$str)) //UTF-8 Chinese character alphanumeric underline regular expression
if(!preg_match("/^[x{4e00}-x{9fa5} ]+$/u",$str)) //UTF-8 Chinese character alphanumeric underline regular expression
{ ;/font>";
}
else
 {
          echo "The [".$str."] you entered is completely legal and passed!"; -----------------------

UTF-8 matching:

In javascript, it is very simple to determine whether a string is Chinese. For example: var str = "php programming"; if (/^[u4e00-u9fa5]+$/.test(str)) { alert("This string is all in Chinese"); } else{ alert("This string Not all are in Chinese"); }

In php, x is used to represent hexadecimal data. Therefore, it is transformed into the following code: $str = "php programming"; if (preg_match("/^[x4e00-x9fa5]+$/",$str)) { print("This string is all in Chinese"); } else { print("Not all of the string is in Chinese"); } It seems that the error is no longer reported, and the judgment result is correct. However, if $str is replaced with the word "programming", the result still shows "Not all of the string is in Chinese". It's Chinese." It seems that this judgment is still not accurate enough.

Important: After checking "Proficient in Regular Expressions", I found that for [x4e00-x9fa5], I made a strengthened explanation myself

In PHP's regular expressions, [x4e00-x9fa5] is actually a combination of characters and character groups The concept, x{hex}, expresses a hexadecimal number. It should be noted that hex can be 1-2 digits or 4 digits, but if it is 4 digits, curly brackets must be added,

At the same time, if It is a hex greater than x{FF} and must be used with the u modifier, otherwise an illegal error will occur
You can only find regular rules for matching full-width characters on the Internet: ^[x80-xff]*^/ , you can not add curly brackets here [u4e00- u9fa5] can match Chinese, but PHP does not support it. However, since the hexadecimal data represented by x, why is it different from the range x4e00-x9fa5 provided in js? So I changed to the code below and found that it was really accurate: $str = "php programming"; if (preg_match("/^[x{4e00}-x{9fa5}]+$/u",$str )) { print("This string is all Chinese"); } else { print("This string is not all Chinese"); }
I know the final result of using regular expressions to match Chinese characters under UTF-8 encoding in PHP Correct expression - /^[x{4e00}-x{9fa5}]+$/u, refer to the above article to write the following test code (copy the following code and save it as a .php file)


GBK:
preg_match("/^[".chr(0xa1)."-".chr( 0xff)."A-Za-z0-9_]+$/",$str); //GB2312 Chinese character alphanumeric underline regular expression

The above has introduced UTF-8 Chinese character regular expressions, including aspects of it. I hope it will be helpful to friends who are interested in PHP tutorials.


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:php connect to databaseNext article:php connect to database