Home >Backend Development >PHP Tutorial >PHP regular expression matching Chinese problem analysis_PHP tutorial

PHP regular expression matching Chinese problem analysis_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 17:49:48753browse

$str = 'People's Republic of China 123456789abcdefg';
echo preg_match("/^[u4e00-u9fa5_a-zA-Z0-9]{3,15}$",$strName);


Run the above code and see what prompts will appear?

Warning: preg_match(): Compilation failed: PCRE does not support L, l, N, P, p, U, u, or X at offset 3 in F:http://www.hzhuti.com/nokia/5800/ on line 2
It turns out that the following Perl escape sequences are not supported in PHP regular expressions: L, l, N, P, p, U, u, or X
In UTF-8 mode, "x{...}" is allowed, and the content in the curly braces is a string representing a hexadecimal number.
The original hexadecimal escape sequence xhh matches a double-byte UTF-8 character if its value is greater than 127.
So,
This can be solved
preg_match("/^[x80-xff_a-zA-Z0-9]{3,15}$",$strName);

preg_match('/[x{2460}-x{2468}]/u', $str);



Match internal code Chinese characters
Test according to the method he provided, the code is as follows:
The code is as follows Copy the code
$str = "php programming";
if (preg_match("/^[x{2460}-x{2468}]+$/u",$str)) {
print("This string is all in Chinese");
} else {
print("This string is not all Chinese");
}



I found that this time I still misjudged whether it was Chinese or not. However, since the hexadecimal data represented by x, why is it different from the range x4e00-x9fa5 provided in js? So I changed it to the following code:
$str = "php programming";
if (preg_match("/^[x4e00-x9fa5]+$/u",$str)) {
print("This string is all in Chinese");
} else {
print("This string is not all Chinese");
}



What I thought was a sure success, unexpectedly, a warning occurred again:
Warning: preg_match() [function.preg-match]: Compilation failed: invalid UTF-8 string at offset 6 in test.php on line 3
It seemed that there was another wrong expression, so I compared the expression in that article and wrapped "4e00" and "9fa5" with "{" and "}" respectively. I ran it again and found that it was really accurate. Now:
$str = "php programming";
if (preg_match("/^[x{4e00}-x{9fa5}]+$/u",$str)) {
print("This string is all in Chinese");
} else {
print("This string is not all Chinese");
}


Know the final correct expression of using regular expressions to match Chinese characters under UTF-8 encoding in PHP——/^[x{4e00}-x{9fa5}]+$/u,
Finally, www.2cto.com
//if (preg_match("/^[".chr(0xa1)."-".chr(0xff)."]+$/", $str)) { //Can only be used in the case of GB2312
if (preg_match(“/^[x7f-xff]+$/”, $str)) { //Compatible with gb2312, utf-8
echo "Enter correctly";
} else {
echo "Wrong input";
}



Double-byte character encoding range
1. GBK (GB2312/GB18030)
x00-xff GBK double-byte encoding range
x20-x7f ASCII
xa1-xff Chinese gb2312
x80-xff Chinese gbk
2. UTF-8 (Unicode)
u4e00-u9fa5 (Chinese)
x3130-x318F (Korean
xAC00-xD7A3 (Korean)
u0800-u4e00 (Japanese)

Excerpted from php development

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/478319.htmlTechArticle$str = People’s Republic of China 123456789abcdefg; echo preg_match(/^[u4e00-u9fa5_a-zA-Z0-9] {3,15}$,$strName); Run the above code and see what prompts there will be? Warning: p...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn