Home >Backend Development >PHP Tutorial >How to use regular expressions to match Chinese characters in PHP

How to use regular expressions to match Chinese characters in PHP

王林
王林Original
2023-06-22 09:16:393764browse

In PHP, regular expression is a commonly used string matching tool. It can be used to determine whether a string conforms to a specific format, thereby verifying the validity of the input value. When processing Chinese characters, since Chinese characters and English characters are encoded differently, the matching rules of the regular expression need to be adjusted accordingly. This article will introduce how to use regular expressions to match Chinese characters in PHP.

1. Understand Chinese character encoding

The commonly used character encodings in PHP are UTF-8 and GBK. UTF-8 is a "variable length encoding", that is, different characters have different lengths, and 1 to 4 bytes are used to represent different characters. GBK is a "fixed-length encoding" in which each character is represented by two bytes.

Since UTF-8 encoding rules are relatively complex, when matching Chinese characters, we can use GBK encoding for matching. In PHP, you can use the mb_convert_encoding() function to convert a string from UTF-8 to GBK encoding, for example:

$str = "中文";
$str_gbk = mb_convert_encoding($str, "GBK", "UTF-8");

2. Match Chinese characters

  1. Match a single Chinese character

To match a single Chinese character, you can use the regular expression [x{4e00}-x{9fa5}]. Where x represents a hexadecimal character, {4e00} and {9fa5} represent the first and last characters in the Chinese character set, namely "一" and "饥" respectively.

Code example:

$str = "中文";
$str_gbk = mb_convert_encoding($str, "GBK", "UTF-8");
preg_match("/[x{4e00}-x{9fa5}]/u", $str_gbk, $match);
echo $match[0];

The output result is:

  1. Match multiple Chinese characters

To match multiple Chinese characters , you can add quantifiers to the regular expression, for example * means matching any number of Chinese characters, means matching at least one Chinese character, {n,m} means matching n to m Chinese characters.

Code example:

$str = "中文编程真有意思";
$str_gbk = mb_convert_encoding($str, "GBK", "UTF-8");
preg_match("/[x{4e00}-x{9fa5}]{2,}/u", $str_gbk, $match);
echo $match[0];

The output result is:

中文编程真有意思
  1. Match Chinese characters and other characters

To match in a string To match Chinese characters and other characters at the same time, you can use [x{4e00}-x{9fa5}] and [w] (match any characters or numbers) to match strings, for example:

Code example:

$str = "中文AI编程真有意思123";
$str_gbk = mb_convert_encoding($str, "GBK", "UTF-8");
preg_match("/[x{4e00}-x{9fa5}w]+/u", $str_gbk, $match);
echo $match[0];

The output result is:

中文AI编程真有意思123

3. Commonly used regular expression functions

  1. preg_match()

preg_match() The function is used to perform regular expression matching on strings and returns the matching result or FALSE.

Syntax: preg_match(string $pattern, string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0]]])

Sample code:

$str = "中文编程真有意思";
$str_gbk = mb_convert_encoding($str, "GBK", "UTF-8");
preg_match("/[x{4e00}-x{9fa5}]+/u", $str_gbk, $match);
echo $match[0];

The output result is:

中文编程真有意思
  1. preg_match_all()

preg_match_all() function is used to find all matching regular expressions in a string Matching results, returns an array composed of all matching results.

Syntax: preg_match_all(string $pattern, string $subject [, array &$matches [, int $flags = PREG_PATTERN_ORDER [, int $offset = 0]]])

Sample code:

$str = "PHP是一门非常有用的编程语言,可以用来开发各种Web应用";
$str_gbk = mb_convert_encoding($str, "GBK", "UTF-8");
preg_match_all("/[x{4e00}-x{9fa5}]+/u", $str_gbk, $match);
print_r($match[0]);

The output result is:

Array
(
    [0] => PHP是一门非常有用的编程语言
    [1] => 可以用来开发各种Web应用
)
  1. preg_replace()

preg_replace() function is used to search and replace strings using a regular expression .

Syntax: preg_replace(mixed $pattern, mixed $replacement, mixed $subject [, int $limit = -1 [, int &$count]])

Sample code:

$str = "我爱编程,编程使我快乐!";
$str_gbk = mb_convert_encoding($str, "GBK", "UTF-8");
$new_str_gbk = preg_replace("/[x{4e00}-x{9fa5}]+/", "", $str_gbk);
$new_str = mb_convert_encoding($new_str_gbk, "UTF-8", "GBK");
echo $new_str;

The output result is:

,使我快乐!

IV. Summary

The above is the method of using regular expressions to match Chinese characters in PHP. Regular expressions can be used to verify the validity of input values. expression is implemented. When using it, you need to pay attention to the encoding method of Chinese characters, and select the corresponding regular expression function for use as needed.

The above is the detailed content of How to use regular expressions to match Chinese characters in PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn