Home > Article > Web Front-end > Detailed explanation of grouping in javascript regular expressions
I wrote an article about getting started with regular expressions before. I thought I had a relatively good understanding of regular expressions, but today I encountered another pitfall. Maybe it’s because I’m not careful enough. Today I’ll focus on sharing javascript with you. Grouping in regular expressions. If you don’t understand JS regular expressions enough, you can click here to learn more.
Grouping is widely used in regular expressions. The grouping I understand is a pair of brackets (). Each pair of brackets represents a grouping.
Grouping can Divided into:
Capturing grouping
Non-capturing grouping
Capturing grouping
Capturing grouping will be used in, for example, match exec In the function, the corresponding grouped results are obtained in the form of the second item and the third item. Let’s look at an example first
var reg = /test(\d+)/; var str = 'new test001 test002'; console.log(str.match(reg)); //["test001", "001", index: 4, input: "new test001 test002"]
(\d+) in the code is a group (some people also call it a sub-pattern), but it all means the same meaning. The above example Test001 is the exact match result,
However, the group matching is to find the characters matching the sub-pattern \d+ from the entire exact match result (that is, test001), which is obviously 001.
But The situation I encountered today is like this
var reg = /test(\d)+/; var str = 'new test001 test002'; console.log(str.match(reg)); //["test001", "1", index: 4, input: "new test001 test002"]
The difference is that (\d+) is changed to (\d)+, the entire matching result is still test001, but the first group matching result But different.
Let’s analyze their differences slowly
(\d+) This is a grouping situation, because by default the matching mode is greedy mode, which means that as much as possible To match more
all \d+, the matched result is 001 and then a pair of brackets are added outside, which is a group, so the matched result in the first group is 001.
Let’s look at the second one (\d)+ in the example is also a greedy pattern. It will first match 0, then 0, and then match 1 at the end. It will also match until the end of the match.
It seems There is no difference from the matching in the first example, but the grouping (\d) here means matching a single number.
According to my previous understanding, it will match the first matched result, which is 0 But this understanding is wrong. Since the entire matching is a greedy mode, matching as many (\d) in the
group as possible will capture the last matched result 1
If it is a non-greedy mode, then It will match as few as possible
var reg = /test(\d)+?/; var str = 'new test001 test002'; console.log(str.match(reg)); //["test001", "0", index: 4, input: "new test001 test002"]
so that the (\d) matching result is 0. Although there are still matching results later, here is as few as possible Match
Non-capturing grouping
var reg = /test(?:\d)+/; var str = 'new test001 test002'; console.log(str.match(reg)); //["test001", index: 4, input: "new test001 test002"]
Non-capturing grouping means that a pair of parentheses is needed in some places, but you don’t want it to become a capturing grouping. I just don’t want this group to be obtained by functions like macth exec
Usually add ?: in front of the brackets, that is, (?:pattern), so that it becomes a non-capturing group,
In this way, the grouped matching content will not appear in the match result, that is, the second item 1 is missing.
This article focuses on (\d+) and (\d) The difference between + is also the pitfall I stepped into today. If there are any mistakes, please correct me.
For more detailed explanations of grouping in javascript regular expressions, please pay attention to the PHP Chinese website!