Home > Article > Backend Development > Detailed explanation of regular expressions
The regular expression language consists of two basic character types: literal (normal) text characters and metacharacters.
Related recommendations:
1. Regular expression syntax tutorial (including online testing tools)
2. PHP regular expression quick introduction video tutorial
Metacharacters have the ability to be processed using regular expressions. Metacharacters can be any single character placed in [ ]
(for example, [a]
means matching a single lowercase character a
), or a sequence of characters ( For example, [a-d]
means matching any character between a, b, c, d
, and \w
means any English letters, numbers and underscores), Common metacharacters are as follows:
Characters | Description | Special instructions |
---|---|---|
. |
Matches any character except the newline character (\n ) |
~ |
[abcde] |
matches any character among a b c d e
|
All characters are or . The relationship |
[a-h] |
matches a to Any character between h
|
~ |
[^fgh] |
does not match Any character in fgh matches |
. Add ^ before the first character of the square brackets [ ] to indicate negation Does not match any characters appearing inside square brackets |
\w |
Matches uppercase and lowercase English characters and numbers 0 to 9 Any one between and the underscore is equivalent to [a-zA-Z0-9_]
|
~ |
##\W
| is the opposite of \w and is equivalent to [^a-zA-Z0-9_]
| ~|
\s
| matches any whitespace character, equivalent to [\f\n\r\t\v]
| ~|
\S
| is the opposite of \s, equivalent to [^\s]
| ~|
\d
| matches any single digit between 0 and 9, equivalent to [0-9]
| ~|
# is the opposite of | \d , equivalent to [^0-9] ~ |
|
Matches any single Chinese character (Chinese) | (the Chinese characters represented by Unicode encoding are used here) ~
|
|
Matches the beginning or end of a word | ~||
Matches the beginning of the string
| when placed before the first character of the brackets, it becomes which means inverse | |
Match the end of the string | ~
Unit:
If the preceding character is a character, then this one The character is a
integer. )
Special Instructions | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
matches 0 to multiple metacharacters, equivalent to {0,} |
~ |
|||||||||||||||||||||||||||
matches 0 to 1 metacharacter, equivalent to {0,1} |
~ |
|||||||||||||||||||||||||||
matches at least 1 metacharacter, equivalent to {1,} |
~ |
|||||||||||||||||||||||||||
Match n metacharacters | ~||||||||||||||||||||||||||||
Match at least n metacharacters | ~||||||||||||||||||||||||||||
Match n to m metacharacters | ~||||||||||||||||||||||||||||
Match word boundaries | ~||||||||||||||||||||||||||||
The string must start with the specified character | ~||||||||||||||||||||||||||||
The string must end with the specified character | ~
Regular | Meaning |
---|---|
Windows98|Windows2000|WindowsXP |
matches Windows98 or Windows2000 or WindowsXP
|
^Windows98|Windows2000|WindowsXP$ |
Starts with Windows98 or contains Windows2000 or ends with WindowsXPNote that ^ and $ are both included in the range of | , because the boundaries of | are only: beginning, end, parentheses |
Windows(98|2000|XP) |
Windows then98 or 2000 orXP
|
Summary: The multi-selection structure can include many characters, but it cannot exceed the boundaries of brackets
.
(\d{1,3}\.){3}\d{1,3}
Simple IP address matching expression((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4] \d|25[0-5]|[01]?\d\d?)
left bracket
of the group as the symbol, from left to right, the first group number The group number is 1, the second one is 2, and so on. Example:
can be used to match duplicates The word
in the regular expression , use parentheses in the front to divide (group), and then put the content matched by the parentheses and quote
to the back, using \1, \2
, etc. To represent. (The first parenthesis is \1
...). If there are parentheses nested inside parentheses (\w (.?))
Remember: At this time, you need to use (
as the symbol to count the parentheses from left to right. .Advanced 3 - Look Around (Zero Width Assertion)
^
, $
like that. Looking around will not occupy characters.
Looking around is divided into position can match
exp . For example: (?=\d)
The right side of the current position is a number.
of the position cannot be matched
exp. For example: (?!\d)
The right side of the current position is not a number.
in front of the position can match
exp. For example: (?<=\d)
To the left of the current position It is a number
in front of the position cannot match
exp. For example: (?!\d )
The left side of the current position is not a number.
*
, {3,12}
, etc.) that can be repeated , The usual behavior is to match as many characters as possible
. Regular expression: a# A string starting with ## and ending with
b. If you use it to search for
aabab, it will match the entire string
aabab, which is called -- -----
Greedy matching
-
.*?
means matching any number of repetitions , but use the least repeated under the premise that the entire can be matched successfully.
a.*?b , a string ending with
b. If applied to
aabab, it will match
aab and
ab.
Summary: The difference between greedy and lazy mode is:
Lazy modeis behind the quantifierWhen using regular expressions, you need to pay attention to the order of matching. Usually the same priority* There is one more question mark ?
.
Advanced 5 - Priority of pattern matching
are higher first and then lower . The matching order priority of various operators is from high to low as shown in the following table.
Order | Metacharacters | Description |
---|---|---|
1 | \ |
Escape characters |
2 |
() 、(?:) 、(?=) 、[]
|
Mode units and atom tables |
3 |
* , ,? 、{n} 、{n,} 、{n,m}
|
Duplicate match |
4 |
^ 、$ 、\b 、\B 、\A 、\Z
|
Border restrictions |
|
| Pattern selection
333333\$33\ How should the \$
in 33333 be written?
2 Question: If the
preg_match function in PHP uses the expressions of
single quotes and double quotes to match the above \$,how to write?
Answer:
. (For the convenience of viewing, we split it into
'/\\ \\ \\ $/')
. (For the convenience of viewing, we split it into
"/\\ \\ \\ \$/")
Another answer:
, So we need 6
\ to generate the expression.
\, double quotes also need one more
\ to escape
$, so it requires 7
\.
The above is the detailed content of Detailed explanation of regular expressions. For more information, please follow other related articles on the PHP Chinese website!