How to quickly master regular expressions? Learn regular grammar through AST!-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

How to quickly master regular expressions? Learn regular grammar through AST!

青灯夜游

Mar 09, 2022 pm 08:13 PM

astjavascriptnode.jsregular expression

Regular expression is a logical formula for string operations. It is an important and complex technology when processing text data. So how to quickly master regular expressions? The following article recommends a learning method: through AST. I hope to be helpful!

How to quickly master regular expressions? Learn regular grammar through AST!

# Regular expressions are basically used to process strings. It is very convenient to use them for string matching, extraction, replacement, etc.

However, learning regular expressions is still somewhat difficult, such as concepts such as greedy matching, non-greedy matching, capturing subgroups, and non-capturing subgroups. It is not only difficult for beginners to understand, but also for many people who have worked for several years. Don't understand.

How to learn regular expressions better? How to quickly master regular expressions?

Recommend a way to learn regular rules that I think is very good: Learn through AST.

The matching principle of regular expressions is to parse the pattern string into AST, and then use this AST to match the target string.

Various information in the pattern string will be saved in the AST after parse. AST is an abstract syntax tree. As the name suggests, it is a tree organized according to a grammatical structure. From the structure of AST, you can easily know the syntax supported by regular expressions.

How to view the AST of a regular expression?

You can view it visually through the website astexplorer.net:

How to quickly master regular expressions? Learn regular grammar through AST!

Switch the language of parse to RegExp, and you can do regular expressions Visualization of the AST of an expression.

As mentioned before, AST is a tree organized according to grammar, so various grammars can be easily sorted out from its structure.

Then let’s learn various syntaxes from the perspective of AST:

/abc/

Let’s start with the simple one, /abc/ The regular expression can match the string 'abc', and its AST is as follows:

How to quickly master regular expressions? Learn regular grammar through AST!

3 Char, the values are a, b, c, and the type is simple. The subsequent matching is to traverse the AST and match these three characters respectively.

We used the exec API to test:

How to quickly master regular expressions? Learn regular grammar through AST!

The 0th element is the matched string, and index is the starting subscript of the matched string. input is the input string.

Let’s try special characters again:

/\d\d\d/

/\d\d\d/ means matching three numbers,\ d is a metacharacter (meta char) with special meaning supported by regular expressions.

We can also see from AST that although they are also Char, their type is indeed meta:

How to quickly master regular expressions? Learn regular grammar through AST!

You can match any metacharacter through \d Number:

How to quickly master regular expressions? Learn regular grammar through AST!

Which is meta char and which is simple char can be seen at a glance through AST.

/[abc]/

Regular supports specifying a group of characters through [], which means that any one of the characters will be matched.

We can also see from AST that it is wrapped with a layer of CharacterClass, which means character class, that is, it can match any character it contains.

How to quickly master regular expressions? Learn regular grammar through AST!

This is indeed the case in the test:

How to quickly master regular expressions? Learn regular grammar through AST!

/a{1,3}/

Regular expressions support specifying how many times a character is repeated, using the form {from,to},

For example, /b{1,3}/ means character b is repeated 1 to 3 times, /[abc] {1,3}/ means that this a/b/c character class is repeated 1 to 3 times.

As can be seen from AST, this syntax is called Repetition:

How to quickly master regular expressions? Learn regular grammar through AST!

It has a quantifier attribute to represent the quantifier, and the type here is range , from 1 to 3.

Regular also supports the abbreviations of some quantifiers, such as 1 to countless times, * 0 to countless times, ? 0 or 1 times.

are different types of quantifiers:

How to quickly master regular expressions? Learn regular grammar through AST!

Some students may ask, what does the greedy attribute here mean?

How to quickly master regular expressions? Learn regular grammar through AST!

greedy means greedy. This attribute indicates whether this Repetition is a greedy match or a non-greedy match.

If you add a ? after the quantifier, you will find that greedy becomes false, which means switching to non-greedy matching:

1How to quickly master regular expressions? Learn regular grammar through AST!

Then greedy and What does non-greed mean?

Let’s see an example.

1How to quickly master regular expressions? Learn regular grammar through AST!

The default Repetition matching is greedy and will continue to match as long as the conditions are met, so acbac can be matched here.

Add a ? after the quantifier to switch to non-greedy, and only the first one will be matched:

1How to quickly master regular expressions? Learn regular grammar through AST!

This is greedy matching and non-greedy matching. Through AST, we can clearly know that greedy and non-greedy are for repeated grammar. The default is greedy matching. Add a ? after the quantifier to switch to non-greedy.

(aaa)bbb(ccc)

Regular expression supports returning part of the matched string into a subgroup through ().

Look through the AST:

1How to quickly master regular expressions? Learn regular grammar through AST!

The corresponding AST is called Group.

And you will find that it has a capturing attribute, the default is true:

1How to quickly master regular expressions? Learn regular grammar through AST!

What does this mean?

This is the syntax for subgroup capture.

If you don’t want to capture the subgroup, you can write like this (?:aaa)

1How to quickly master regular expressions? Learn regular grammar through AST!

Look, capturing becomes false.

What is the difference between capture and non-capture?

Let’s try:

1How to quickly master regular expressions? Learn regular grammar through AST!

Oh, it turns out that the capturing attribute of Group represents whether to extract or not.

We can see from the AST that capture is for subgroups. The default is capture, which means the content of the subgroup is extracted. You can switch to non-capture through ?: and it will not be extracted. The content of the subgroup is gone.

We are already familiar with using AST to understand regular syntax. Let’s look at something a bit more difficult:

/bbb(?=ccc)/

Regular expression The formula supports the syntax of (?=xxx) to express lookahead assertions, which are used to determine whether a certain string is preceded by a certain string.

You can see through AST that this syntax is called Assertion, and the type is lookahead, that is, looking forward, only matching the previous meaning:

1How to quickly master regular expressions? Learn regular grammar through AST!

This What does it mean? Why do you write this? What is the difference between /bbb(ccc)/ and /bbb(?:ccc)/?

Let’s try:

1How to quickly master regular expressions? Learn regular grammar through AST!

It can be seen from the results:

/bbb(ccc)/ matches the subgroup of ccc and This subgroup was extracted because the default subgroup is captured.

/bbb(?:ccc)/ matches the subgroup of ccc but is not extracted because we pass ?: to set the subgroup not to capture.

/bbb(?=ccc)/ The subgroup matching ccc is not extracted, indicating that it is also non-capturing. The difference between it and ?: is that ccc does not appear in the matching result.

This is the nature of lookahead assertion: Lookahead assertion means that a certain string is preceded by a certain string, the corresponding subgroup is not captured, and the asserted string will not appear in the matching results.

If it is not followed by that string, it will not match:

How to quickly master regular expressions? Learn regular grammar through AST!

/bbb(?!ccc)/

After changing ?= to ?!, the meaning changes. Take a look through the AST:

2How to quickly master regular expressions? Learn regular grammar through AST!

Although the lookahead assertion is still asserted first, there is an additional negative attribute of true.

The meaning is very obvious. Originally, it means that the front is a certain string. After negation, it means that the front is not a certain string.

The matching result is just the opposite:

2How to quickly master regular expressions? Learn regular grammar through AST!

#Now it will match only if the preceding string is not a certain string. This is a negative look-ahead assertion.

/(?

If there is a preceding assertion, there will naturally be a trailing assertion, that is, it will match only if it is followed by a certain string.

2How to quickly master regular expressions? Learn regular grammar through AST!

Similarly, it can also be denied: The AST corresponding to

2How to quickly master regular expressions? Learn regular grammar through AST!

##(?

2How to quickly master regular expressions? Learn regular grammar through AST!

Look-ahead assertion and look-behind assertion are the most difficult to understand regular expression syntax. Is it much easier to understand if you learn it through AST~

Summary

Regular expressions are used to process strings It is a very convenient tool, but it is still somewhat difficult to learn. Many people are confused about syntax such as greedy matching, non-greedy matching, capturing subgroups, non-capturing subgroups, lookahead assertions, and lookbehind assertions.

I recommend learning regular rules through AST. AST is an object tree organized according to the syntax structure. Various syntaxes can be easily clarified through the names and attributes of AST nodes.

For example, we have clarified it through AST:

Repetition syntax (Repetition) is the form of character quantifier. The default is greedy matching (greedy is true), which means matching until no matching. So far, add a ? after the quantifier to switch to non-greedy matching, and stop when one character is matched.

Subgroup syntax (Group) is used to extract a certain string. The default is capturing (capturing is true), which means extraction is required. You can switch to it through (?:xxx) Non-capturing, only matching without extraction.

Assertion syntax (Assertion) represents that there is a certain string before or after it, which is divided into lookahead assertion and lookbehind assertion. The syntax is (?= xxx) and (?

Is it the deep understanding of syntax in various documents or the deep understanding of syntax in the compiler?

No need to ask, it must be the compiler!

Then it is naturally better to learn grammar through the syntax tree parsed according to the grammar than the document.

Regular expressions are like this, and other grammar learning is also like this. If you can learn the grammar using AST, you don’t need to read the documentation.

For more node-related knowledge, please visit:

nodejs tutorial!

The above is the detailed content of How to quickly master regular expressions? Learn regular grammar through AST!. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:掘金社区. If there is any infringement, please contact admin@php.cn delete

es6数组怎么去掉重复并且重新排序May 05, 2022 pm 07:08 PM

去掉重复并排序的方法：1、使用“Array.from(new Set(arr))”或者“[…new Set(arr)]”语句，去掉数组中的重复元素，返回去重后的新数组；2、利用sort()对去重数组进行排序，语法“去重数组.sort()”。

JavaScript的Symbol类型、隐藏属性及全局注册表详解Jun 02, 2022 am 11:50 AM

本篇文章给大家带来了关于JavaScript的相关知识，其中主要介绍了关于Symbol类型、隐藏属性及全局注册表的相关问题，包括了Symbol类型的描述、Symbol不会隐式转字符串等问题，下面一起来看一下，希望对大家有帮助。

原来利用纯CSS也能实现文字轮播与图片轮播！Jun 10, 2022 pm 01:00 PM

怎么制作文字轮播与图片轮播？大家第一想到的是不是利用js，其实利用纯CSS也能实现文字轮播与图片轮播，下面来看看实现方法，希望对大家有所帮助！

JavaScript对象的构造函数和new操作符（实例详解）May 10, 2022 pm 06:16 PM

本篇文章给大家带来了关于JavaScript的相关知识，其中主要介绍了关于对象的构造函数和new操作符，构造函数是所有对象的成员方法中，最早被调用的那个，下面一起来看一下吧，希望对大家有帮助。

JavaScript面向对象详细解析之属性描述符May 27, 2022 pm 05:29 PM

本篇文章给大家带来了关于JavaScript的相关知识，其中主要介绍了关于面向对象的相关问题，包括了属性描述符、数据描述符、存取描述符等等内容，下面一起来看一下，希望对大家有帮助。

javascript怎么移除元素点击事件Apr 11, 2022 pm 04:51 PM

方法：1、利用“点击元素对象.unbind("click");”方法，该方法可以移除被选元素的事件处理程序；2、利用“点击元素对象.off("click");”方法，该方法可以移除通过on()方法添加的事件处理程序。

整理总结JavaScript常见的BOM操作Jun 01, 2022 am 11:43 AM

本篇文章给大家带来了关于JavaScript的相关知识，其中主要介绍了关于BOM操作的相关问题，包括了window对象的常见事件、JavaScript执行机制等等相关内容，下面一起来看一下，希望对大家有帮助。

foreach是es6里的吗May 05, 2022 pm 05:59 PM

foreach不是es6的方法。foreach是es3中一个遍历数组的方法，可以调用数组的每个元素，并将元素传给回调函数进行处理，语法“array.forEach(function(当前元素,索引,数组){...})”；该方法不处理空数组。

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Repo: How To Revive Teammates

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

3 weeks agoByDDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

3 weeks agoByDDD

Hot Tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

SublimeText3 Linux new version

SublimeText3 Linux latest version

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.