Home  >  Article  >  Backend Development  >  Personal understanding of regular expressions - lazy matching

Personal understanding of regular expressions - lazy matching

WBOY
WBOYOriginal
2016-08-08 09:33:021011browse

Problem description

Link to this article: http://www.hcoding.com/?p=130

When I first learn regular expressions, I have a question. For example, if I need to match the characters between the first pair of "_" in the string "_abc_123_", when I first start learning regular expressions, I will write it as "/_w*_ /", the matching result is "abc_123" instead of "abc"; the master said to add a question mark, "/_w*?_/", then the matching result is "abc" ".

We know'? ' when used alone means: repeat zero or once, while when '? ' appears after the repeat qualifier, and its function is lazy matching, that is, matching as few characters as possible. Lazy qualifier description:

  • *?: Repeat as many times as you like, but repeat as little as possible
  • +?: Repeat 1 or more times, but repeat as little as possible
  • ??: Repeat 0 or 1 times, but repeat as little as possible
  • {n,m}?: Repeat n to m times, but repeat as little as possible
  • {n,}?: Repeat more than n times, but repeat as little as possible

Yes, "reduce as little as possible", this is a crude and straightforward explanation of lazy matching.

So how do you understand “as little repetition as possible”? We can explain it from the ignored priority quantifier of regular expressions.

Ignore priority quantifier

The quantifiers "*?", "+?", "??", "{n,m}?", "{n,}?" are all ignored priority quantifiers. The ignored priority quantifiers are used in ?, +, It is composed of adding ? after * and {}. Ignore priority will first try to ignore when matching. If it fails, it will choose to try after backtracking. For example, if `ab??` matches "abb", it will get "a" instead of "ab". When the engine successfully matches a, because it ignores priority, the engine first chooses not to match b, and continues to check the expression. If it finds that the expression has ended, the engine will directly report that the match was successful. Specifically, we use the following example to explain step by step the working principle of ignoring priority quantifiers.

Example

Still with the above example, use "/_w*?_/" to match the characters between the first pair of "_" in "_abc_123_".

After starting to match the first '_', 'w*?' first decides that it does not need to match any characters because it ignores the priority quantifier. At this time, it takes the expression '/_w*?_/' The two '_' (the '_' after 'w*?') match the 'a' in the target string '_abc_123_'. If the match fails, then 'w*?' will be used. Try the unmatched branch (use w to match a, try to match a successfully)

Next step, should we try to match or ignore it? Because 'w*?' ignores the priority quantifier and will choose to ignore it, then repeat the previous step. '_' fails to match b, and 'w*?' tries the unmatched branch ab. After repeating the above steps a total of 3 times ( Until the '_' after the expression 'w*?' matches the second '_' of the target string), 'abc' is finally matched.

Process (after starting to match the first '_'):

    The second '
  • _' in the expression /_w*?_/' matches the 'a' in the target string '_abc_123_'. The match fails and 'w*?' tries to match the target. The 'a' in the string '_abc_123_' is matched successfully.
  • The second '
  • _' in the expression /_w*?_/' matches the 'b' in the target string '_abc_123_'. The match fails and 'w*?' tries to match the target. 'ab' in the string '_abc_123_' is matched successfully.
  • The second '
  • _' in the expression /_w*?_/' matches the 'c' in the target string '_abc_123_'. The match fails and 'w*?' tries to match the target. 'abc' in the string '_abc_123_' is matched successfully.
  • The second '
  • _' in the expression /_w*?_/' matches the '_' in the target string '_abc_123_'. The match is successful and the match ends. The result is abc.

The above are my thoughts after reading the section about ignoring priority quantifiers in "Mastering Regular Expressions". If I find something wrong, I will humbly accept your advice. Thank you!

Link to this article: http://www.hcoding.com/?p=130

Original article, please indicate: JC&hcoding.com

The above introduces my personal understanding of regular expressions - lazy matching, including aspects of it. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn