Home >Web Front-end >JS Tutorial >JavaScript Advanced Programming (3rd Edition) Study Notes 12 js regular expression_basic knowledge

JavaScript Advanced Programming (3rd Edition) Study Notes 12 js regular expression_basic knowledge

WBOY
WBOYOriginal
2016-05-16 17:49:181066browse

需要指出的是,这里只是总结了正则表达式的常用的且比较简单的语法,而不是全部语法,在我看来,掌握了这些常用语法,已经足够应对日常应用了。正则表达式不只是应用在ECMAScript中,在JAVA、.Net、Unix等也有相应应用,这篇文章则是以ECMAScript中的正则表达式为基础总结的。

一、正则表达式基础

1、普通字符:字母、数字、下划线、汉字以及所有没有特殊意义的字符,如ABC123。在匹配时,匹配与之相同的字符。

2、特殊字符:(需要时,使用反斜杠“”进行转义)

字符 含义 字符 含义 字符 含义 字符 含义
\a 响铃符 = \x07 ^ 匹配字符串的开始位置 \b 匹配单词的开始或结束 {n} 匹配n次
\f 换页符 = \x0C $ 匹配字符串的结束位置 \B 匹配不是单词开始和结束的位置 {n,} 匹配至少n次
\n 换行符 = \x0A () 标记一个子表达式的开始和结束 \d 匹配数字 {n,m} 匹配n到m次
\r 回车符 = \x0D [] 自定义字符组合匹配 \D 匹配任意不是数字的字符 [0-9] 匹配0到9中任意一个数字
\t 制表符 = \x09 {} 修饰匹配次数的符号 \s 匹配任意空白字符 [f-m] 匹配f到m中任意一个字母
\v 垂直制表符 = \x0B . 匹配除换行符外的字符 \S 匹配任意非空白字符    
\e ESC符 = \x1B ? 匹配0或1次 \w 匹配字母或数字或下划线或汉字    
\xXX 使用两位十六进制表示形式,可与该编号的字符匹配 + 匹配1或多次 \W 匹配任意不是字母、数字、下划线和汉字的字符    
\uXXXX 用四位十六进制表示形式,可与该编号的字符匹配 * 匹配0或多次 [^x] 匹配除x外的所有字符    
\x{XXXXXX} 使用任意位十六进制表示形式,可与该编号的字符匹配 | 左右两边表达式之间“或”关系 [^aeiou] 匹配除aeiou外的所有字符    

上面列举的这些特殊字符,可以大致的分为: 

(1)不便书写字符:如响铃符(a)、换页符(f)、换行符( )、回车符( )、制表符( )、ESC符(\e)

(2)十六进制字符:如两位(\x02)、四位(\x012B)、任意位(x{A34D1})

(3)表示位置字符:如字符串开始(^)、字符串结束($)、单词开始和结束()、单词中间(\B)

(4)表示次数字符:如0或1次(?)、1或多次(+)、0或多次(*)、n次({n})、至少n次({n,})、n到m次({n,m})

(5)修饰字符:如修饰次数({})、自定义组合匹配([])、子表达式(())

(6)反义字符:

  (A)通过大小写反义:如和B、d和D、s和S、w和W

  (B)通过[^]反义:如[^x]、[^aeiou]

  (C)其它特例:如 和.也构成反义

(7)范围字符:如数字范围([0-9])、字母范围([f-m])

(8)逻辑字符:如表示或(|)

3、转义

(1)使用反斜杠“”转义单个字符

(2)使用“Q...\E”转义,将表达式中间出现的字符全部作为普通字符

(3)使用“U...\E”转义,将表达式中间出现的字符全部作为普通字符,并且将小写字母转换成大写匹配

(4)使用“L...\E”转义,将表达式中间出现的字符全部作为普通字符,并且将大写字母转换为小写匹配

4、贪婪模式与懒惰模式

   如果正则表达式中含有次数字符时,一般情况下,会尽可能匹配更多的字符,比如用l*n来匹配linjisong的话,会匹配linjison,而不是 lin,这种模式也就是正则表达式的贪婪模式;相对应的,可以通过添加字符“?”来设置为懒惰模式,也即尽可能匹配更少字符。如*?表示重复0次或多次, 但尽可能少重复。

5、分组和反向引用

(1)用小括号(())将表达式包含,可以使得表达式作为一个整体来处理,从而达到分组的目的。

(2)默认情况下,每个分组会自动获取一个组号,按照左括号的顺序,从1向后编号。

(3)引擎在处理时,会将小括号内部表达式匹配的内容保存下来,以方便在匹配过程中或匹配结束后进一步处理,可以使用反斜杠和组号来引用这个内容,如1表示第一个分组匹配的文本。

(4)也可以自定义组名,语法是(?exp),这个时候反向引用时,还可以使用\k

(5)也可以不保存匹配内容,也不分配组号,语法是(?:exp)。

(6)小括号有一些其他特殊语法,这里列举几种,不再深入讨论:

分类 代码/语法 说明
捕获 (exp) 匹配exp,并捕获文本到自动命名的组里
(?exp) 匹配exp,并捕获文本到名称为name的组里,也可以写成(?'name'exp)
(?:exp) 匹配exp,不捕获匹配的文本,也不给此分组分配组号
零宽断言 (?=exp) 匹配exp前面的位置
(?<=exp) 匹配exp后面的位置
(?!exp) 匹配后面跟的不是exp的位置
(? 匹配前面不是exp的位置
注释 (?#comment) 这种类型的分组不对正则表达式的处理产生任何影响,用于提供注释让人阅读

At this point, it is enough to understand commonly used regular expressions. If you want to continue learning regular expressions, you can refer to the 30-minute introductory tutorial on regular expressions. Let's get familiar with the implementation of regular expressions in Javascript.

2. Regular expression object RegExp in Javascript

1. Create regular expression

(1) Use literals: syntax var exp = /pattern/flags;

A. pattern is any regular expression

B. There are three types of flags: g means global mode, i means ignore case, and m means multi-line mode

(2) Use the RegExp built-in constructor: syntax var exp = new RegExp(pattern, flags);

A. When using the constructor, pattern and flags are both in string form, so double escaping is required for escape characters, for example:

字面量 构造函数
/[bc]at/ "\[bc\]at"
/.at/ "\.at"
/name/age/ "name\/age"
/d.d{1,2}/ "\d.\d{1,2}"
/w\helllo3/ "\w\hello\123"

Note: ECMAScript 3 will share a RegExp instance when using literals. Using new RegExp(pattern, flags) will create an instance for each regular expression; ECMAScript 5 stipulates that a new instance be created every time.

2. Instance attributes

(1) global: Boolean value, indicating whether the g flag is set.

(2) ignoreCase: Boolean value, indicating whether the i flag is set.

(3) multiline: Boolean value, indicating whether the m flag is set.

(4) lastIndex: integer, indicating the character position to start searching for the next match, starting from 0.

(5) source: string, which represents a string pattern created in literal form. Even if the instance is created using the constructor, the string pattern is stored in literal form.

3. Instance method

(1)exec() method

A. One parameter, which is the string to which the pattern is to be applied, and returns an array of the first matching item information. If there is no match, null is returned.

B. The returned array is an Array instance, but it also has input and index attributes, which respectively represent the string to which the regular expression is applied and the position of the matching item in the string.

C. When matching, in the returned array, the first item is the string that matches the entire pattern, and the other items are strings that match the groups in the pattern (if there is no grouping, the returned array only has 1 item) .

D. For exec(), even if g is set, a match is returned each time. The difference is that if g is set, the starting search position of calling exec multiple times is different. If g is not set, each time it returns from Start your search.

(2) test() method

Accepts a string parameter, returns true if matched, false if not matched.

3. Example Analysis

Let’s look at a regular expression used for formatting from the PhoneGap source code

Copy the code The code is as follows:

var pattern = /(.*?)%(.)(.*)/;
var str = 'lin%%jisong';
var match = pattern.exec (str);
console.info(match.join(','));//lin%%jisong,lin,%,jisong

var pattern2 = /(.*)%(. )(.*)/;
var match2 = pattern2.exec(str);
console.info(match2.join(','));//lin%%jisong,lin%,j,isong


Analysis: Both pattern and pattern2 here contain three groups, the 2nd and 3rd groups are the same, the 2nd group (.) matches any non-newline character, the 3rd group (.*) matches as many non-newline characters as possible (greedy mode), the first group in pattern (.*?) matches as few as possible (lazy mode) any non-newline characters, and the first group in pattern2 A group (.*) matches as many non-newline characters as possible (greedy mode). Therefore, on the premise of ensuring that the entire pattern is matched successfully (therefore a % character needs to be reserved for matching % in the regular expression), the first group in pattern matches lin, and the first group in pattern 2 matches lin. %, after analyzing the output in the above example, it is not difficult to understand.
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn