Home  >  Article  >  Backend Development  >  PHP regular expression matching Chinese_PHP tutorial

PHP regular expression matching Chinese_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:44:591247browse

To use regular expressions to match Chinese characters in php, we need to understand the string encoding and the internal code of the Chinese characters. This way we can quickly and easily accurately match Chinese characters. Let me introduce it to you.


To determine whether a string is Chinese in php, you will follow this idea:

The code is as follows
 代码如下 复制代码
$str = "php编程";
if (preg_match("/^[u4e00-u9fa5]+$/",$str)) {
print("该字符串全部是中文");
} else {
print("该字符串不全部是中文");
}
?>
Copy code


$str = "php programming";

if (preg_match("/^[u4e00-u9fa5]+$/",$str)) {

print("This string is all in Chinese");

} else {

print("This string is not all Chinese");

}

?>

 代码如下 复制代码
$str = "php编程";
if (preg_match("/^[x4e00-x9fa5]+$/",$str)) {
print("该字符串全部是中文");
} else {
print("该字符串不全部是中文");
}

However, you will soon find that php does not support such expressions and an error message is reported:

Warning: preg_match() [function.preg-match]: Compilation failed: PCRE does not support L, l, N, U,

or u at offset 3 in test.php on line 3


I checked it many times on Google at the beginning and wanted to use PHP regular expressions for hexadecimal data

I made a breakthrough in the way of expression and found that in php, x is used to represent hexadecimal data. So,

is transformed into the following code:

The code is as follows
 代码如下 复制代码

(1)     ANSI编程环境下:

$strtest = “yyg中文字符yyg”;

$pregstr = "/([".chr(0xb0)."-".chr(0xf7)."][".chr(0xa1)."-".chr(0xfe)."])+/i";

if(preg_match($pregstr,$strtest,$matchArray)){

echo $matchArray[0];

}

//output:中文字符

(2)     Utf-8编程环境下:

$strtest = “yyg中文字符yyg”;

$pregstr = "/[x{4e00}-x{9fa5}]+/u";

if(preg_match($pregstr,$strtest,$matchArray)){

echo $matchArray[0];

}

//output:中文字符

Copy code
$str = "php programming";
if (preg_match("/^[x4e00-x9fa5]+$/",$str)) { print("This string is all in Chinese");

} else {

print("This string is not all Chinese"); It seems that no error is reported, and the judgment result is correct. However, if $str is replaced with the word "programming", the result still displays "The string is not all in Chinese", see This judgment is still not accurate enough. If you want to accurately match Chinese, that is, match pure Chinese characters, or match Chinese characters plus full-width punctuation, you need to use different methods according to different encoding environments. The following uses two commonly used encodings (gb2312, utf-8)
Here are two examples:
The code is as follows
Copy code
(1) In ANSI programming environment: $strtest = “yyg Chinese character yyg”; $pregstr = "/([".chr(0xb0)."-".chr(0xf7)."][".chr(0xa1)."-".chr(0xfe)."])+/ i"; if(preg_match($pregstr,$strtest,$matchArray)){ echo $matchArray[0]; } //output: Chinese characters (2) In Utf-8 programming environment: $strtest = “yyg Chinese character yyg”; $pregstr = "/[x{4e00}-x{9fa5}]+/u"; if(preg_match($pregstr,$strtest,$matchArray)){ echo $matchArray[0]; } //output: Chinese characters http://www.bkjia.com/PHPjc/633077.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/633077.htmlTechArticleIf we want to use regular expressions to match Chinese characters in php, we need to understand the string encoding and the internal code of the Chinese characters. In this way, accurate matching of Chinese characters can be achieved conveniently and quickly...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn