Home >Backend Development >PHP Tutorial >Solutions to several problems with PHP regular expression matching Chinese
Run the above code, you will be prompted: Warning: preg_match(): Compilation failed: PCRE does not support L, l, N, P, p, U, u, or X at offset 3 in F:wwwrootphptest.php on line 2 The reason is: the following Perl escape sequences are not supported in PHP regular expressions: L, l, N, P, p, U, u, or X In UTF-8 mode, "x{...}" is allowed, and the content in the curly brackets is a string representing a hexadecimal number. The original hexadecimal escape sequence xhh matches a double-byte UTF-8 character if its value is greater than 127. Solution:
match internal code Chinese characters Test it as he provided:
This operation is still correct Determining whether it is Chinese is abnormal. However, since the hexadecimal data represented by x, why is it different from the range x4e00-x9fa5 provided in js? So the code was modified as follows:
warning is generated again: Warning: preg_match() [function.preg-match]: Compilation failed: invalid UTF-8 string at offset 6 in test.php on line 3 Then I modified it and wrapped "4e00" and "9fa5" with "{" and "}" respectively. I ran it again and found that it was accurate this time:
I know utf in php The final correct expression for matching Chinese characters using regular expressions under -8 encoding: /^[x{4e00}-x{9fa5}]+$/u, The final version of the implementation code:
Example 2,
Attached, the double-byte character encoding range in PHP 1. GBK (GB2312/GB18030) x00-xff GBK double-byte encoding range x20-x7f ASCII xa1-xff Chinese gb2312 x80-xff Chinese gbk2. UTF-8 (Unicode) u4e00-u9fa5 (Chinese) x3130-x318F (Korean xAC00-xD7A3 (Korean) u0800-u4e00 (Japanese)Let’s introduce these, I hope it will help everyone understand the method of regular matching Chinese in PHP. Programmer's Home, I wish you all the best in your studies and progress. |