Home >Backend Development >PHP Tutorial >Personal understanding of regular expressions - lazy matching
Problem description
Link to this article: http://www.hcoding.com/?p=130
When I first learn regular expressions, I have a question. For example, if I need to match the characters between the first pair of "_" in the string "_abc_123_", when I first start learning regular expressions, I will write it as "/_w*_ /", the matching result is "abc_123" instead of "abc"; the master said to add a question mark, "/_w*?_/", then the matching result is "abc" ".
We know'? ' when used alone means: repeat zero or once, while when '? ' appears after the repeat qualifier, and its function is lazy matching, that is, matching as few characters as possible. Lazy qualifier description:
Yes, "reduce as little as possible", this is a crude and straightforward explanation of lazy matching.
So how do you understand “as little repetition as possible”? We can explain it from the ignored priority quantifier of regular expressions.
Ignore priority quantifier
The quantifiers "*?", "+?", "??", "{n,m}?", "{n,}?" are all ignored priority quantifiers. The ignored priority quantifiers are used in ?, +, It is composed of adding ? after * and {}. Ignore priority will first try to ignore when matching. If it fails, it will choose to try after backtracking. For example, if `ab??` matches "abb", it will get "a" instead of "ab". When the engine successfully matches a, because it ignores priority, the engine first chooses not to match b, and continues to check the expression. If it finds that the expression has ended, the engine will directly report that the match was successful. Specifically, we use the following example to explain step by step the working principle of ignoring priority quantifiers.
Example
Still with the above example, use "/_w*?_/" to match the characters between the first pair of "_" in "_abc_123_".
After starting to match the first '_', 'w*?' first decides that it does not need to match any characters because it ignores the priority quantifier. At this time, it takes the expression '/_w*?_/' The two '_' (the '_' after 'w*?') match the 'a' in the target string '_abc_123_'. If the match fails, then 'w*?' will be used. Try the unmatched branch (use w to match a, try to match a successfully)
Next step, should we try to match or ignore it? Because 'w*?' ignores the priority quantifier and will choose to ignore it, then repeat the previous step. '_' fails to match b, and 'w*?' tries the unmatched branch ab. After repeating the above steps a total of 3 times ( Until the '_' after the expression 'w*?' matches the second '_' of the target string), 'abc' is finally matched.
Process (after starting to match the first '_'):
The above are my thoughts after reading the section about ignoring priority quantifiers in "Mastering Regular Expressions". If I find something wrong, I will humbly accept your advice. Thank you!
Link to this article: http://www.hcoding.com/?p=130
Original article, please indicate: JC&hcoding.com
The above introduces my personal understanding of regular expressions - lazy matching, including aspects of it. I hope it will be helpful to friends who are interested in PHP tutorials.