Home >Backend Development >PHP Tutorial >Detailed explanation of sub-patterns of regular expressions in php_PHP tutorial
The article introduces a detailed explanation of the sub-patterns of regular expressions in PHP. Friends who need to know the sub-patterns of regular expressions in PHP can refer to it.
Function
mixed preg_replace (mixed pattern, mixed replacement, mixed subject [, int limit])
Function
Search subject for a match of pattern and replace with replacement. If limit is specified, only limit matches are replaced; if limit is omitted or has a value of -1, all matches are replaced.
Replacement can contain a reverse reference in the form of n or $n. n can be from 0 to 99. n represents the text matching the nth sub-pattern of pattern.
The regular expression enclosed in parentheses in the $pattern parameter, the number of sub-patterns is the number of parentheses from left to right. (pattern is pattern)
The code is as follows | Copy code |
代码如下 | 复制代码 |
$time = date ("Y-m-d H:i:s"); $pattern = "/d{4}-d{2}-d{2} d{2}:d{2}:d{2}/i"; if(preg_match($pattern,$time,$arr)){ echo " ";<br> print_r($arr); <br> echo ""; } ?> |
$pattern = "/d{4}-d{2}-d{2} d{2}:d{2}:d{2}/i";
If(preg_match($pattern,$time,$arr)){echo "
";<br> Print_r($arr); <br> echo "";
Display results:
Array
(
[0] => 2012-06-23 03:08:45) Have you noticed that the displayed result only has one piece of data, which is the time format that matches the matching pattern? So if there is only one record, why should it be saved in an array? Wouldn't it be better to save it directly as a string?
代码如下 | 复制代码 |
$time = date ("Y-m-d H:i:s"); $pattern = "/(d{4})-(d{2})-(d{2}) (d{2}):(d{2}):(d{2})/i"; if(preg_match($pattern,$time,$arr)){ echo " ";<br> print_r($arr); <br> echo ""; } ?> |
The code is as follows | Copy code |
"; Print_r($arr); echo ""; } ?> |
Note: I only modified $pattern. In the matching pattern, I used brackets ()
Execution result:
Array
(
[0] => 2012-06-23 03:19:23
[1] => 2012
[2] => 06
[3] => 23
[4] => 03
[5] => 19
[6] => 23
)
Summary: We can use parentheses to group the entire matching pattern. By default, each group will automatically have a group number. The rule is, from left to right, with the left parenthesis of the group as the mark, the first group that appears The first one is group number 1, the second one is group number 2, and so on. Among them, group 0 corresponds to the entire regular expression. After the entire regular matching pattern is grouped, you can further use "backward reference" to repeatedly search for text matching a previous group. For example: 1 represents the text matched by group 1, 2 represents the text matched by group 2, etc. We can further modify the code as follows:
The code is as follows | Copy code |
代码如下 | 复制代码 |
$time = date ("Y-m-d H:i:s"); $pattern = "/(d{4})-(d{2})-(d{2}) (d{2}):(d{2}):(d{2})/i"; $replacement = "$time格式为:<🎜> 替换后的格式为:1年2月3日 4时5分6秒"; print preg_replace($pattern, $replacement, $time); if(preg_match($pattern,$time,$arr)){ echo " ";<br> print_r($arr); <br> echo ""; } ?> |
"; print_r($arr); echo ""; } ?>
Note:
Because it is in double quotes, you should use two backslashes when using grouping, such as: 1, and if it is in single quotes, use one backslash, such as: 1
1 is used to capture the content of group one: 2012, 6 is used to capture the content of group 6
Execution result:
The format of $time is: 2012-06-23 03:30:31
The replaced format is: June 23, 2012 03:30:31
Array
(
[0] => 2012-06-23 03:30:31
[1] => 2012
[2] => 06
[3] => 23
[4] => 03
[5] => 30
[6] => 31
)
Advanced Regular Expressions
In addition to POSIX BRE and ERE, libutilitis also supports advanced regular expression languages compatible with TCL 8.2
Law (ARE). ARE mode can be enabled by adding the prefix "***:" to the stRegEx parameter. This prefix replaces
Cover bExtended option. Basically, ARE is a superset of ERE. It performs the following steps based on ERE
Item extension:
1. Support "lazy matching" (also called "non-greedy matching" or "shortest matching"): in '?', '*', '+' or '{m,n}'
You can enable shortest matching by appending the '?' symbol after , so that the regular expression clause matches
if the conditions are met.
Match as few characters as possible (the default is to match as many characters as possible). For example: apply "a.*b" to "abab"
When "a.*?b" is used, it will match the entire string ("abab"). If "a.*?b" is used, it will only match the first two characters ("ab").
2. Supports forward reference matching of subexpressions: In stRegEx, you can use 'n' to forward reference a previously defined
Subexpression. For example: "(a.*)1" can match "abcabc" etc.
3. Unnamed subexpression: Use "(?:expression)" to create an unnamed expression. The unnamed expression does not return
to an 'n' match.
4. Forward prediction: To hit a match, the specified conditions must be met forward. Forward prediction is divided into positive prediction and negative prediction
There are two kinds. The syntax for positive prediction is: "(?=expression)", for example: "bai.*(?=yang)" matches "bai yang"
The first four characters ("bai ") in , but when matching, ensure that the string must contain "yang".
after "bai.*"
The syntax for negative judgment is: "(?!expression)", for example: "bai.*(?!yang)" matches the first
of "bai shan"
Four characters, but when matching, it is ensured that "yang" does not appear after "bai.*" in the string.
5. Support mode switching prefix, "***:" can be followed by a pattern string in the form of "(?pattern string)", pattern
The string affects the semantics and behavior of subsequent expressions. The pattern string can be a combination of the following characters:
b - Switch to POSIX BRE mode, overriding the bExtended option.
e - Switch to POSIX ERE mode, overriding the bExtended option.
q - Switch to text literal matching mode, the characters in the expression are searched as text, and all regular expressions are canceled
Semantics. This mode reduces regular matching to a simple string search. "***=" prefix is its shortcut representation
Method, meaning: "***=" is equivalent to "***:(?q)".
c - Perform case-sensitive matching, overrides the bNoCase option.
i - performs a case-ignoring match, overrides the bNoCase option.
n - Enable line-sensitive matching: '^' and '$' match the beginning and end of the line; '.' and negation sets ('[^...]') do not
Matches newline characters. This functionality is equivalent to the 'pw' pattern string. Override the bNewLine option.
m - Same as 'n'.
p - '^' and '$' only match the beginning and end of the entire string, not lines; '.' and the negative set do not match newlines.
Overrides the bNewLine option.
w - '^' and '$' match the beginning and end of the line; '.' and the negative set match newlines. Override the bNewLine option.
s - '^' and '$' only match the beginning and end of the entire string, not lines; '.' and the negative set match newlines. Reply
Cover bNewLine option. This mode is used by default in ARE state.
x - Turn on extended mode: In extended mode, whitespace characters and content after the comment character '#' in the expression will be ignored
For example:
@code@
(?x)
s+ ([[:graph:]]+) # first number
s+ ([[:graph:]]+) # second number
@code@
Equivalent to "s+([[:graph:]]+)s+([[:graph:]]+)".
T -Close the extension mode without ignoring the content of the blank and notes. This mode is used by default in ARE state.
6. Perl-style character class escape sequence different from BRE/ERE mode:
perl class Equivalent POSIX expression Description
-------------------------------------------------- --------------------------
a - Bell character
A - No matter what the current mode is, only the beginning of the entire string is matched
b - Backspace character ('x08')
B - The escape character itself ('')
cX - Control symbol-X (= X & 037)
D [[: Digit:]] 10 -in -made numbers ('0' -'9')
D [^[:digit:]]
e - Exit character ('x1B')
f - Form feed ('x0C')
m [[:<:]] Word starting position
M [[:>:]] Since
n - Newline character ('x0A')
r - Carriage return character ('x0D')
s [[:space:]]
S [^[:space:]]
t Tab character ('x09')
uX - 16-bit UNICODE characters (X∈[0000 .. FFFF])
UX - 32-bit UNICODE characters (X∈[00000000 .. FFFFFFFF])
v - Vertical tab character ('x0B')
w [[:alnum:]_] Characters that make up words
W [^[:alnum:]_] Non-word characters
xX - 8-bit characters (X∈[00 .. FF])
y - Word boundary (m or M)
Y - Non-word boundary
Z - No matter what the current mode is, only the last
of the entire string is matched.
- NULL, empty character
X - Subexpression forward reference (X∈[1 .. 9])
XX - Subexpression forward reference or 8-character octal representation
XXX - Subexpression forward reference or 8-character octal representation