Home  >  Article  >  Backend Development  >  .net c# regular expression balanced group/recursive matching

.net c# regular expression balanced group/recursive matching

巴扎黑
巴扎黑Original
2016-12-19 16:32:111585browse

.net c# Regular expression balanced group/recursive matching

Balanced group/recursive matching

The balanced group syntax introduced here is supported by the .Net Framework; other languages/libraries may not support this function or support this function But a different syntax is required.

Sometimes we need to match a nestable hierarchical structure like (100 * (50 + 15)). In this case, simply using (.+) will only match the leftmost left bracket and the rightmost right bracket. The content between brackets (here we are discussing greedy mode, lazy mode also has the following problems). If the number of occurrences of the left bracket and the right bracket in the original string is not equal, such as (5 / (3 + 2))), then the number of the two in our matching result will not be equal. Is there any way to match the longest, matching content between brackets in such a string?

In order to avoid ( and ( completely confusing your brain, let’s use angle brackets instead of round brackets. Now our question becomes how to change xx aa> yy like this In the string, capture the content within the longest paired angle brackets?

The following syntax construct needs to be used here:

(?'group') Name the captured content as group and push it onto the stack ( Stack)
(?'-group') Pops the captured content named group last pushed onto the stack from the stack. If the stack is originally empty, the matching of this group fails
(?(group)yes|no) If the stack If there is a captured content named group, continue to match the yes part of the expression, otherwise continue to match the no part
(?!) Zero-width negative lookahead assertion, because there is no suffix expression, trying to match always fails
If you If you are not a programmer (or you call yourself a programmer but don’t know what a stack is), you can understand the above three syntaxes like this: the first is to write a "group" on the blackboard, and the second is to write a "group" on the blackboard. Erase a "group". The third step is to see if there is still "group" written on the blackboard. If there is, continue to match the yes part. Otherwise, match the no part.

What we need to do is every time we encounter the left bracket, Just push an "Open", and every time it encounters a right bracket, pop one out. At the end, check whether the stack is empty - if it is not empty, it proves that there are more left brackets than right brackets, and the match should fail. . The regular expression engine will backtrack (discard some of the first or last characters) and try to match the entire expression.


< ;]* #The outermost left bracket is not the contents of the parentheses
(
(
(
(
(
(? 'Open' & lt;) #碰
[^& lt; & gt; & gt; ]*     #Match the content that is not a parenthesis after the left bracket
       )+ 
                                                                                                              #match The content that is not the bracket after the right bracket is
)+
)*
(?(Open)(?!)) #Before encountering the outermost right bracket, determine whether there is any "Open" on the blackboard that has not been erased; If there are, the most common application of the matching failure

& gt; #外 如果 如果 Copy code


The most common application is to match HTML. The following example can match the nested & lt; div & gt; label:



]*>[^<>]*(((?'Open']*>)[^<>]* )+((?'-Open'

)[^<>]*)+)*(?(Open)(?!))

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:.net encryption algorithmNext article:.net encryption algorithm

Related articles

See more