


Regular expression is a logical formula for string operations. It is an important and complex technology when processing text data. So how to quickly master regular expressions? The following article recommends a learning method: through AST. I hope to be helpful!
# Regular expressions are basically used to process strings. It is very convenient to use them for string matching, extraction, replacement, etc.
However, learning regular expressions is still somewhat difficult, such as concepts such as greedy matching, non-greedy matching, capturing subgroups, and non-capturing subgroups. It is not only difficult for beginners to understand, but also for many people who have worked for several years. Don't understand.
How to learn regular expressions better? How to quickly master regular expressions?
Recommend a way to learn regular rules that I think is very good: Learn through AST.
The matching principle of regular expressions is to parse the pattern string into AST, and then use this AST to match the target string.
Various information in the pattern string will be saved in the AST after parse. AST is an abstract syntax tree. As the name suggests, it is a tree organized according to a grammatical structure. From the structure of AST, you can easily know the syntax supported by regular expressions.
How to view the AST of a regular expression?
You can view it visually through the website astexplorer.net:
Switch the language of parse to RegExp, and you can do regular expressions Visualization of the AST of an expression.
As mentioned before, AST is a tree organized according to grammar, so various grammars can be easily sorted out from its structure.
Then let’s learn various syntaxes from the perspective of AST:
/abc/
Let’s start with the simple one, /abc/ The regular expression can match the string 'abc', and its AST is as follows:
3 Char, the values are a, b, c, and the type is simple. The subsequent matching is to traverse the AST and match these three characters respectively.
We used the exec API to test:
The 0th element is the matched string, and index is the starting subscript of the matched string. input is the input string.
Let’s try special characters again:
/\d\d\d/
/\d\d\d/ means matching three numbers,\ d is a metacharacter (meta char) with special meaning supported by regular expressions.
We can also see from AST that although they are also Char, their type is indeed meta:
You can match any metacharacter through \d Number:
Which is meta char and which is simple char can be seen at a glance through AST.
/[abc]/
Regular supports specifying a group of characters through [], which means that any one of the characters will be matched.
We can also see from AST that it is wrapped with a layer of CharacterClass, which means character class, that is, it can match any character it contains.
This is indeed the case in the test:
/a{1,3}/
Regular expressions support specifying how many times a character is repeated, using the form {from,to},
For example, /b{1,3}/ means character b is repeated 1 to 3 times, /[abc] {1,3}/ means that this a/b/c character class is repeated 1 to 3 times.
As can be seen from AST, this syntax is called Repetition:
It has a quantifier attribute to represent the quantifier, and the type here is range , from 1 to 3.
Regular also supports the abbreviations of some quantifiers, such as 1 to countless times, * 0 to countless times, ? 0 or 1 times.
are different types of quantifiers:
Some students may ask, what does the greedy attribute here mean?
greedy means greedy. This attribute indicates whether this Repetition is a greedy match or a non-greedy match.
If you add a ? after the quantifier, you will find that greedy becomes false, which means switching to non-greedy matching:
Then greedy and What does non-greed mean?
Let’s see an example.
The default Repetition matching is greedy and will continue to match as long as the conditions are met, so acbac can be matched here.
Add a ? after the quantifier to switch to non-greedy, and only the first one will be matched:
This is greedy matching and non-greedy matching. Through AST, we can clearly know that greedy and non-greedy are for repeated grammar. The default is greedy matching. Add a ? after the quantifier to switch to non-greedy.
(aaa)bbb(ccc)
Regular expression supports returning part of the matched string into a subgroup through ().
Look through the AST:
The corresponding AST is called Group.
And you will find that it has a capturing attribute, the default is true:
What does this mean?
This is the syntax for subgroup capture.
If you don’t want to capture the subgroup, you can write like this (?:aaa)
Look, capturing becomes false.
What is the difference between capture and non-capture?
Let’s try:
Oh, it turns out that the capturing attribute of Group represents whether to extract or not.
We can see from the AST that capture is for subgroups. The default is capture, which means the content of the subgroup is extracted. You can switch to non-capture through ?: and it will not be extracted. The content of the subgroup is gone.
We are already familiar with using AST to understand regular syntax. Let’s look at something a bit more difficult:
/bbb(?=ccc)/
Regular expression The formula supports the syntax of (?=xxx) to express lookahead assertions, which are used to determine whether a certain string is preceded by a certain string.
You can see through AST that this syntax is called Assertion, and the type is lookahead, that is, looking forward, only matching the previous meaning:
This What does it mean? Why do you write this? What is the difference between /bbb(ccc)/ and /bbb(?:ccc)/?
Let’s try:
It can be seen from the results:
/bbb(ccc)/ matches the subgroup of ccc and This subgroup was extracted because the default subgroup is captured.
/bbb(?:ccc)/ matches the subgroup of ccc but is not extracted because we pass ?: to set the subgroup not to capture.
/bbb(?=ccc)/ The subgroup matching ccc is not extracted, indicating that it is also non-capturing. The difference between it and ?: is that ccc does not appear in the matching result.
This is the nature of lookahead assertion: Lookahead assertion means that a certain string is preceded by a certain string, the corresponding subgroup is not captured, and the asserted string will not appear in the matching results.
If it is not followed by that string, it will not match:
/bbb(?!ccc)/
After changing ?= to ?!, the meaning changes. Take a look through the AST:
Although the lookahead assertion is still asserted first, there is an additional negative attribute of true.
The meaning is very obvious. Originally, it means that the front is a certain string. After negation, it means that the front is not a certain string.
The matching result is just the opposite:
#Now it will match only if the preceding string is not a certain string. This is a negative look-ahead assertion.
/(?
If there is a preceding assertion, there will naturally be a trailing assertion, that is, it will match only if it is followed by a certain string.
Similarly, it can also be denied: The AST corresponding to
(?
Repetition syntax (Repetition) is the form of character quantifier. The default is greedy matching (greedy is true), which means matching until no matching. So far, add a ? after the quantifier to switch to non-greedy matching, and stop when one character is matched.
Subgroup syntax (Group) is used to extract a certain string. The default is capturing (capturing is true), which means extraction is required. You can switch to it through (?:xxx) Non-capturing, only matching without extraction.
Assertion syntax (Assertion) represents that there is a certain string before or after it, which is divided into lookahead assertion and lookbehind assertion. The syntax is (?= xxx) and (?
Is it the deep understanding of syntax in various documents or the deep understanding of syntax in the compiler? No need to ask, it must be the compiler! Then it is naturally better to learn grammar through the syntax tree parsed according to the grammar than the document. Regular expressions are like this, and other grammar learning is also like this. If you can learn the grammar using AST, you don’t need to read the documentation. For more node-related knowledge, please visit:nodejs tutorial!
The above is the detailed content of How to quickly master regular expressions? Learn regular grammar through AST!. For more information, please follow other related articles on the PHP Chinese website!

去掉重复并排序的方法:1、使用“Array.from(new Set(arr))”或者“[…new Set(arr)]”语句,去掉数组中的重复元素,返回去重后的新数组;2、利用sort()对去重数组进行排序,语法“去重数组.sort()”。

本篇文章给大家带来了关于JavaScript的相关知识,其中主要介绍了关于Symbol类型、隐藏属性及全局注册表的相关问题,包括了Symbol类型的描述、Symbol不会隐式转字符串等问题,下面一起来看一下,希望对大家有帮助。

怎么制作文字轮播与图片轮播?大家第一想到的是不是利用js,其实利用纯CSS也能实现文字轮播与图片轮播,下面来看看实现方法,希望对大家有所帮助!

本篇文章给大家带来了关于JavaScript的相关知识,其中主要介绍了关于对象的构造函数和new操作符,构造函数是所有对象的成员方法中,最早被调用的那个,下面一起来看一下吧,希望对大家有帮助。

本篇文章给大家带来了关于JavaScript的相关知识,其中主要介绍了关于面向对象的相关问题,包括了属性描述符、数据描述符、存取描述符等等内容,下面一起来看一下,希望对大家有帮助。

方法:1、利用“点击元素对象.unbind("click");”方法,该方法可以移除被选元素的事件处理程序;2、利用“点击元素对象.off("click");”方法,该方法可以移除通过on()方法添加的事件处理程序。

本篇文章给大家带来了关于JavaScript的相关知识,其中主要介绍了关于BOM操作的相关问题,包括了window对象的常见事件、JavaScript执行机制等等相关内容,下面一起来看一下,希望对大家有帮助。

foreach不是es6的方法。foreach是es3中一个遍历数组的方法,可以调用数组的每个元素,并将元素传给回调函数进行处理,语法“array.forEach(function(当前元素,索引,数组){...})”;该方法不处理空数组。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

SublimeText3 Linux new version
SublimeText3 Linux latest version

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version
Useful JavaScript development tools

SublimeText3 English version
Recommended: Win version, supports code prompts!
