Home >Backend Development >PHP Tutorial >php—PCRE regular expression annotation and recursive mode
Comments
Character sequence (?# mark starts a comment until a closing parenthesis is encountered. Nested parentheses are not allowed. Characters in comments will not be matched as part of the pattern.
If the PCRE_EXTENDED option is set, a character The unescaped # character outside the class means that the rest of the line is a comment.
Recursive mode
Considering the problem of matching strings within parentheses, the best way is to allow unlimited nesting of parentheses. Use a pattern to match a fixed depth of nesting. Perl 5.6 provides an experimental feature that allows regular expression recursion (?R) to provide this special use of recursion. PCRE mode solves the parentheses issue (assuming the PCRE_EXTENDED option is set, so whitespace characters are ignored): ( ( (?>[^()]+) | (?R) )* ). First, it matches. An opening bracket. It then matches any number of non-bracket character sequences or a recursive match of the pattern itself (e.g., a correct bracket substring), and finally, a closing bracket.
This example pattern contains infinite repetitions of nesting. , therefore using a one-time subgroup match for non-bracket characters, which is very important when the pattern is applied to a string that does not match the pattern. For example, when it is applied to (aaaaaaaaaaaaaaaaaaaaaaaaaaaa() it will quickly produce a "no match" result. However, without using one-shot subgroups, this match would take a long time because there are many ways for the + and * repeat qualifiers to separate the target string, and all paths would need to be tested before failure is reported
All capturing subgroups eventually. The captured values that are set are the values captured from the recursive outermost subpattern. If the above pattern matches (ab(cd)ef), the captured subgroup is finally set to the value "ef", which is the last value obtained at the top level. value. If additional brackets are added, ( ( ( (?>[^()]+) | (?R) )* ) ), the captured string is the matching content of the top bracket "ab(cd)ef ”. If there are more than 15 capture brackets in the pattern, PCRE uses pcre_malloc to allocate additional memory to store the data during the recursion, and then frees them via pcre_free. If no memory can be allocated, it only saves the first 15 captures. Parentheses, out-of-memory errors cannot be given inside recursion.
Starting from PHP 4.3.3, (?1), (?2), etc. can be used for recursive subgroups: ( ?P>name) or (?P&name).
If the recursive subgroup syntax is used outside the subgroup bracket it refers to (whether it is a subgroup number or a subgroup name), this operation is equivalent to that in programming languages. subroutine. One of the previous examples states that the pattern (sens|respons)e and 1ibility matches "sense and responsibility" and "response and responsibility", but does not match "sense and responsibility". If instead the pattern (sens|respons)e and (?1)ibility were used, it would match "sense and responsibility" just like those two strings. The meaning of this reference method is to match the subpattern of the reference immediately. (Annotation: Backward references only match the previously matched results of the referenced subgroup. The recursive syntax reference here is to rematch the referenced subpattern.)
The maximum length of the target string is the maximum positive integer that can be stored in an int type variable . However, PCRE uses recursion to handle subgroups and infinite repetitions. This means that for some modes the available stack space may be limited by the target string.