Home  >  Article  >  Backend Development  >  javascript - Regular expression to match the content of the innermost bracket

javascript - Regular expression to match the content of the innermost bracket

WBOY
WBOYOriginal
2016-08-04 09:19:461527browse

Now there is a string:

<code>str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))'
</code>

or

<code>str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))'
</code>

I need to match the innermost brackets in the string and the content inside them (not matching brackets within quotation marks) through regular expressions, that is:

<code>str1 => (status_id = "C" OR level_id = "D")

str2 => (level_id = "D" AND subject_id = "(Cat)")
</code>

So, how should we write this super complex regular expression?

If regular expression cannot be implemented, how can it be implemented with JS?


Added, for str1, I found such a regular expression that can satisfy the matching:

<code>\([^()]+\)
</code>

But for str2, there is still no solution. I look forward to everyone’s answers!

Reply content:

Now there is a string:

<code>str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))'
</code>

or

<code>str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))'
</code>

I need to match the innermost brackets in the string and the content inside them (not matching brackets within quotation marks) through regular expressions, that is:

<code>str1 => (status_id = "C" OR level_id = "D")

str2 => (level_id = "D" AND subject_id = "(Cat)")
</code>

So, how should we write this super complex regular expression?

If regular expression cannot be implemented, how can it be implemented with JS?


Added, for str1, I found such a regular expression that can satisfy the matching:

<code>\([^()]+\)
</code>

But for str2, there is still no solution. I look forward to everyone’s answers!

For str2, I found this

<code>\([^()]*\"[^"]*\"[^()]*\)</code>

After looking at the requirements, I didn’t consider using regular expressions at all. It seemed too complicated... Let’s just use the traditional method;
You can use the idea of ​​operation priority, that is, use the stackdata structure to obtain the contents of the inner brackets ;
Technical points:

  1. Match the innermost bracket

  2. Contents within quotation marks are not used as matching criteria

Start designing the algorithm based on this idea:
The algorithm calculates the startIndex and endIndex of the substring to be matched and then uses the substring() method to obtain the substring;

  • When a "(" character is matched, is pushed onto the stack. When we match the first ")", is popped out of the stack, which is the sub-character between the two indices. String is the target string;

  • When
  • matches a """, it will stop matching "(". It will not continue to search until the next """ is found. "(".

This is an algorithm that I came up with through brainstorming. If there are any shortcomings, please feel free to add.

//This way, try
/(([^()]*?"[^"()]*([^"()]+)[^()]*?"[^() ]*)+)|([^()]+)/


Added:

Analyze needs > Find solutions for each demand point > Integrate solutions = Solve problems

Analysis requirements:

  1. needs to match the form of ( a )

  2. There are two possibilities for the characters contained in a, represented by a1 and a2

    1. a1contains one or more strings in the form of b " c " b,

      1. where b is a string that does not include ", ( or )

      2. where c is a string that does not include "

    2. a2 does not contain ( or )

Reverse derivation:

2.2 => a2 = [^()]*
2.1.1 => b = [^()"]*
2.1.2 => c = [^"]*
2.1 => a1= (b"c"b)+ = (b"c")+b =([^()"]*"[^" ]*")+[^()"]*
1 => (a) = (a1)|(a2) = (([^()"]*"[^"]* ")+[^()"]*)|([^()]*)

Regular expression:

<code>/\(([^\(\)\"]*\"[^\"]*\")+[^\(\)\"]*\)|\([^\(\)]*\)/</code>

Verification:

<code class="javascript">var reg = /\(([^\(\)\"]*\"[^\"]*\")+[^\(\)\"]*\)|\([^\(\)]*\)/;

'(the (quick "brown" fox "jumps over, (the) lazy" dog ))'
    .match(reg)[0]
//"(quick "brown" fox "jumps over, (the) lazy" dog )"

'(the ("(quick)" brown fox "jumps (over, the)" lazy) dog )'
    .match(reg)[0];
//"("(quick)" brown fox "jumps (over, the)" lazy)"

'(the (quick brown fox (jumps "over", ((the) "lazy"))) dog )'
    .match(reg)[0];
//"(the)"</code>

Then change it like this:

<code>substr=str.match(/\([^()]+\)/g)[0]
</code>

Get the innermost bracket and the value in it, and then determine whether the first digit of the value is ", and whether the last digit is "":

<code>index=str.indexOf(str.match(/\([^()]+\)/g)[0])
length=str.match(/\([^()]+\)/g)[0].length
str.substr(index+length,1)
str.substr(index-1,1)
</code>

If it does not exist, it is the required answer. If it exists, replace substr in str first, then match it, and finally replace it back:

<code>str.replace(substr,"&&&")
str.replace(substr,"&&&").match(/\([^()]+\)/g)[0]
str.replace(substr,"&&&").match(/\([^()]+\)/g)[0].replace("&&&",substr)
</code>

本题难点在需要对""进行递归统计,例如

<code>(level_id = "D AND subject_id = "(Cat)"")</code>

(cat)是符合要求的.

<code>\([^()]*?\"((?:[^\"\"]|\"(?1)\")*+)\"[^()]*?\)|\([^()]*?\)
</code>

真爱生命,远离正则,该正则可以满足你的要求,php能用(php支持递归)java及Python无法使用.

推荐一个思路,找到(的index,切字符串处理

手机发不出正则 黑线
楼主的【^()】里如果不匹配()则继续
把不匹配(的条件去掉,把贪婪的+改成*?即可

!代码

console.log('(subject_id = “A” OR (status_id = “Open” AND (status_id = “C” OR level_id = “D”)))'.match(/(1*)/))
希望对你有帮助
"javascript


  1. () ↩

用正则匹配会比较复杂,建议 把干扰串 "( 和 )" 替换掉,比如 "[, ]",再用简单的正则替换,之后再换回来。

正则用 Python 实现如下:

<code>import re

str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))'
str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))'

pat = re.compile(r"""(?<=[^"])
        \([^()]+?
        ("\(.+?\)")*
        \)
        (?=[^"])
        """, re.X)

print pat.search(str1).group(0)
print pat.search(str2).group(0)</code>

输出为:

<code>(status_id = "C" OR level_id = "D")
(level_id = "D" AND subject_id = "(Cat)")
</code>
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn