Home  >  Article  >  Backend Development  >  php — PCRE regular expression one-time subgrouping

php — PCRE regular expression one-time subgrouping

伊谢尔伦
伊谢尔伦Original
2016-11-21 17:13:481204browse

For duplicates with both maximum and minimum quantifier restrictions, after the match fails, another number of repetitions will be used to re-evaluate whether the pattern can be matched. When the pattern author knows for sure that there is no problem with implementation, it is useful to prevent this behavior by changing the behavior of the match or causing earlier matches to fail.

Consider an example, when the pattern d+foo is applied to the target row 123456bar:

Fails to match "foo" after matching 6 digits. The usual behavior is that the matcher tries to make d+ match only 5 digits, and only matches 4 numbers, tried in sequence before ultimately failing. One-shot subgroups provide a special meaning in that once part of the pattern is matched, it is not re-evaluated, so the matcher can fail immediately after the first failure to match "foo". Syntax symbols are another special kind of brackets, starting with (?>, such as (?>d+)bar.

This kind of brackets provide a "lock" on part of the pattern, which will prevent it from containing a match. The backward traceback inside the future pattern fails here, and other work continues as usual. In other words, if the current matching point in the target string is an anchor point, this type of subgroup. The matched string is equivalent to a standalone pattern match.

A one-time subgroup is not a capturing subgroup. In simple terms, it eats as many matching characters as it can. Therefore, although d+ and d+? will adjust the number of digits to be matched so that other parts of the pattern match, but (?>d+) can only match the entire sequence of digits.

This (grammatical) structure can contain characters of any complexity and can also be embedded. Set.

One-time subgroups can be used with lookahead assertions to specify a valid match at the end of the target string. Consider when a simple pattern such as abcd$ is applied to a long string that does not match from the left. Handled to the right, PCRE looks for each "a" in the target and then checks to see if the remainder of the pattern matches immediately. If the pattern is ^.*abcd$, then the initial .* will match the entire string first, but When it fails (because it is not followed by "a"), it will backtrack through all matches, spitting out the last character, the second to last character, and so on, searching for "a" in the entire string from right to left. Therefore, we can't exit nicely. However, if the pattern is written ^(?>.*)(?<=abcd) then it will not backtrack the .* part, it will just match the whole string. The predicate does a test on the last four characters at the end of the string. If it fails, the match fails immediately. For long strings, this pattern will bring significant performance improvements in processing time.

When containing a subgroup that can repeat itself infinitely and has infinitely repeated elements inside it, using a one-time subgroup is the only way to avoid some failed matches that take a lot of time. The pattern (D+|)*[!?] matches one. Unlimited number of non-numeric characters or numeric characters enclosed by <> followed by ! or ?. When it matches, runtime is fast. However, if it is applied to "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" an error will be reported. It took a lot of time before. This is because the string can be used for both repetition rules and needs to be tried for both repetition rules (the end of the example uses [!?] instead of a single character, because of PCRE and perl. All will optimize the fast error reporting when the pattern ends with a single character. They keep track of the last single character that needs to be matched, and quickly report an error if they don't appear in the string. ) If the mode is changed to ((?>D+)|)*[!?] you will get an error quickly. (Annotation: For the mode given here, when the target string is longer, the time consumption will increase rapidly, so use with caution.)

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn