Home  >  Article  >  Backend Development  >  php—PCRE regular expression Unicode character attributes

php—PCRE regular expression Unicode character attributes

伊谢尔伦
伊谢尔伦Original
2016-11-21 17:25:251352browse

Since PHP 4.4.0 and 5.1.0, three additional escape sequences are used to match common character types when UTF-8 mode is selected. They are:

p{xx}

A character with attribute xx

P{xx}

A character without attribute xx

X

An extended Unicode character

The attribute name represented by xx above is used For restricting Unicode general class properties. Each character has one such defining property, specified by two abbreviated letters. For compatibility with perl, you can add ^ after the left curly brace { to indicate negation. For example: p{^Lu} is equivalent to P{Lu}.

If only one letter is specified via p or P , it includes all properties starting with this letter. In this case, the curly brace escape sequence is optional.

p{L}
pL

Specifying case-insensitive matching will have no effect on these escape sequences. For example, p{Lu} always matches uppercase letters.

Unicode character sets are defined in specific literals. Use a literal name to match a character in these character sets. For example:

p{Greek}

P{Han}

If it is not in the determined text, it will be concentrated into Common.

X escape matches any number of Unicode characters. X is equivalent to (?>PMpM*)

That is, it matches a character without the "mark" attribute, followed by any number of characters with the "mark" attribute. and considers this sequence to be a group of atoms (see below for details). Typically characters with the "mark" attribute are accents that affect the preceding character.

Using Unicode attributes to match characters is not fast because PCRE needs to search a data structure containing more than 15000 characters. This is why in PCRE the traditional escape sequences d, w are used instead of the Unicode attribute.


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn