Home >Backend Development >PHP Tutorial >Detailed introduction to php-PCRE regular expression back references

Detailed introduction to php-PCRE regular expression back references

伊谢尔伦
伊谢尔伦Original
2017-03-30 15:03:221537browse

Outside of a character class, a backslash followed by a number greater than 0 (and possibly a single digit) is a backreference to some capturing group that previously appeared in the pattern.

If the number immediately following the backslash is less than 10, it is always a backreference, and an error will be raised if there are not that many capturing groups in the pattern. In other words, the number of quoted parentheses cannot be less than the number of quoted parentheses less than 10. See the "Backslashes" section above to see how numbers are handled.

A backreference will directly match the content actually captured by the referenced capturing group in the target string, rather than matching the content of the subgroup pattern. Therefore, the pattern (sens|respons)e and 1ibility will match "sense and sensibility" and "response and responsibility", but not "sense and responsibility". If case-sensitive matching is forced when backreferencing, for example ((?i)rah)s+1 matches "rah rah" and "RAH RAH", but will not match "RAH rah", even if the original capture sub The group itself is not case sensitive. Annotation: What needs to be considered here is that the content expected by the backward reference is exactly the same as the content obtained by the referenced capture subgroup (of course, we can make it indistinguishable by setting internal options before the backward reference) Case insensitivity can also be achieved by changing the case or adding pattern modifiers. However, this approach actually controls its behavior from the outside.) There may be more than one back reference. References to the same subgroup. A subgroup may not actually be used for a particular match, in which case any backreferences to the subgroup will fail. For example, the pattern (a|(bc))2 always fails when matching strings that begin with "a" but not "bc". Because there may be up to 99 backreferences, any number immediately following the backslash may be a potential backreference count. If the pattern is followed by a numeric character immediately after the backreference, some delimiter must be used to terminate the backreference syntax. This can be done using spaces if the PCRE_EXTENDED option is set. In other cases an empty comment can be used.

If a backreference appears inside the subgroup it refers to, its matching will fail. For example, (a1) will not get any matches. However this reference can be used for internal subpattern repetition. For example, the pattern (a|b1)+ will match any number of strings composed of "a" as well as "aba", "ababba", etc. (Annotation: Because there is an optional path inside the subgroup, there is an optional path The path can complete the matching. After the matching is completed, the back reference can refer to the content). During each iteration of the subpattern, the backreference matches the string that the subgroup matched during the previous iteration. In order to do this work, the pattern must satisfy the condition that the pattern must be able to guarantee that no backreferences need to be matched on the first iteration. This condition can be implemented using an optional path as in the example above, or by modifying the back reference using a quantifier with a minimum value of 0.

As of PHP 5.2.2, the g escape sequence can be used for both absolute and relative references to subpatterns. This escape sequence must be followed by an unsigned number or a negative number, optionally wrapped in parentheses. There is a synonym relationship between sequence 1, g1, and g{1}. This usage eliminates the ambiguity caused by using a backslash followed by a numeric value to describe a backreference. This escape sequence helps distinguish backreferences from octal numeric characters, and also makes it clearer that a backreference is followed by a source-matching digit, such as g{2}1. g An escape sequence followed by a negative number represents a relative backreference. For example: (foo)(bar)g{-1} can match the string "foobarbar", (foo)(bar)g{2} can match "foobarfoo". This is used as an option in long mode to keep track of the subgroup numbers of previous references to a particular subgroup.

Backward references also support syntax description using subgroup names, such as (?P=name) or starting from PHP 5.2.2, k or k’name’ can be used. In addition, support for k{name} and g{name} was added in PHP 5.2.4.

The above is the detailed introduction of php-PCRE regular expression back reference. For more related content, please pay attention to the PHP Chinese website (www.php.cn)!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn