Home  >  Article  >  Web Front-end  >  Basic introduction to regular expressions and learning examples

Basic introduction to regular expressions and learning examples

零下一度
零下一度Original
2017-06-26 13:41:06950browse

I often encounter situations where I need to use regular expressions, and I always search, copy, and paste habitually. Over time, I used to know a little about regular expressions, but now I have difficulty even reading a complex expression. . Therefore, I decided to learn regular expressions well. Let’s start with the basics, record several common expressions, practice the ability to read regular expressions, and briefly introduce the regexp object in JavaScript.

1. Introduction to regular expressions:

Regular expression: It is a text pattern composed of ordinary characters and special characters . It describes A string matching rule that can be used to check whether a string contains a certain substring, replace a certain string, or extract a certain substring from a string. Building regular expressions Just like mathematical expressions, you can combine small expressions to build larger expressions.

Normal characters: Normal characters include all printable and non-printable characters that are not explicitly designated as metacharacters. This includes all uppercase and lowercase letters, all numbers, all punctuation marks, and some others. symbol.

Non-printing characters
Non-printing characters Description
\cx Matches the control character specified by x. For example \cM matches a Control-M or carriage return character. The value of x must be one of A-Z or a-z. Otherwise, c is treated as a literal 'c' character.
\f Matches a form feed character. Equivalent to \x0c and \cL.
\n Matches a newline character. Equivalent to \x0a and \cJ.
\r Matches a carriage return character. Equivalent to \x0d and \cM.
\s Matches any whitespace character, including spaces, tabs, form feeds, etc. Equivalent to [ \f\n\r\t\v].
\S matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v]
\t matches a tab character. Equivalent to \x09 and \cI
\v matches a vertical tab character. Equivalent to \x0b and \cK

##Special characters: Some characters with special meanings, such as the wildcard "*", if these characters are to be used in Searching in a string requires the use of escape characters, that is, backslash\.

Wildcard Special characters Description $ Matches the end of the input string. () Marks the beginning and end of a subexpression. * Matches the previous subexpression zero or more times. + Matches the previous subexpression one or more times. . Matches any single character except the newline character \n. [ Marks the beginning of a square bracket expression. ? Matches the preceding subexpression one or zero times or specifies a non-greedy qualifier. \ Mark the next character as a special character, metacharacter, backward reference, or octal escape character. ^ Matches the beginning of the input string, unless used in a square bracket expression, in which case it indicates that the character set is not accepted. { Marks the beginning of the qualifier expression. | Indicates a choice between two items.

Qualifier: The qualifier is used to specify how many times a given component of the regular expression must appear to satisfy the number of matches. The number of matches is not fixed, mainly including {0 or 1}, {1 or n} , {0 or 1}, {n}, {greater than or equal to n}, {greater than or equal to n, less than or equal to m} six types. The corresponding expressions are *, +,? ,{n},{n,},{n,m}.

Qualifier
Qualifier Description
* Match the preceding subexpression zero or more times.
+ Matches the preceding expression one or more times.
? Matches the preceding subexpression zero or one time.
{n} n is a non-negative integer that matches a certain number of n times.
{n,} n is a non-negative integer that matches at least n times.
{n,m} m,n is a non-negative integer, n<=m, matching at least n times and at most m times. Note that there cannot be a space between "," and the two numbers.

Locator: Locator allows us to fix the regular expression to the beginning or end of the line. You can also create special expressions, for example: regular expressions that appear within a word or at the beginning of a word or at the end of one or more times. Locators are used to describe string or word boundaries.

Locator
CharacterDescription
^ Matches the beginning of the input string. If you set RegExp to the Multiline attribute, you can also match the positions after \n and \r.
$ Matches the position at the end of the input string. If you set RegExp to the Multiline attribute, you can also match the positions after \n and \r.
\b Matches a word boundary, that is, the position between a word and a space.
\BNon-word boundary matching.

Note: Qualifiers cannot be used with anchor points. Since there cannot be more than one position before or after a newline or word boundary, expressions such as "^*" are not allowed.

To match text at the beginning of a line of text, ^ should be placed at the beginning of the regular expression; to match text at the end of a line of text, use $ at the end of the regular expression. Do not confuse this use of ^ or $ with the use inside bracket expressions.

#eg:

/^Chapter [1-9][0-9]{0,1}/matching chapter titles are two Number of digits for titles starting with Chapter;

/^Chapter [1-9][0-9]{0,1}$/ matches both the beginning and end of chapter titles, That is to say, this line only has the chapter title;

/\bCha/ matches the word boundary, that is, the boundary starting with Cha;

/ter\b/Words ending in ter.

/\Bapt/, matches non-word boundaries, such as apt in chapter matches, but aptitude does not match;

Selection: Enclose all items in parentheses, and separate adjacent selections with |. But a side effect of using parentheses is that the relevant matches will be cached and available at this time? : Put in front of the first option to eliminate this side effect;

Non-capturing element:

? :To eliminate unwanted caching side effects, place it before the first option in the selection.
? =Forward lookup, matches the search string wherever it begins to match the regular pattern within parentheses.
? ! Reverse prefetch, match the search string at any beginning that does not match the regular pattern.

Backreference: (Inexplicably, I thought of recursion in the C language...) I think in layman's terms, a backreference refers to a reference to a pattern that has been cached, which is the pass in the following article\ n to access the buffer. Adding parentheses around a regular expression pattern or part of a pattern will cause the associated matches to be stored in a temporary buffer (similar to a substring. Remember what I said at the beginning, regular expressions are like mathematical expressions, consisting of small simple expressions. expressions are combined into large complex expressions, these temporary buffers can be understood as the matching results of those small expressions), and each captured submatch is stored in the order from left to right in the expression. Buffer numbers start at 1 and can store up to 99 captured subexpressions. Each buffer can be accessed with \n, where n is a one- or two-digit decimal number that identifies the specific buffer. Can non-capturing metacharacters be used? :,? =,? ! to override capture and ignore caching of related matches.

One of the simplest and most useful applications of backreferences is to provide the ability to find matches of two identical adjacent words in text:

eg1: Extract identical words.

var str=Is this the cost of gasoline going up up?

var patt1= /\b([a-z]+) \1\b/;

#document.write(str.match(patt1));

eg2: Split the URL, from this example It seems that the back reference is actually quoting or extracting the sub-terms in the large expression and using them. I feel that this back reference is a bit obscure. (Put it here for now, make a note, and modify it later);

#var patt1 = /(\w+):\/\/([^/:]+)(:\d*)?([^#]*)/; ##

#arr = str .match(patt1);#2. Practice reading regular expressions. 1./chapter [1-9][0-9]/

This expression matches chapter + any integer, such as 1, 19, 109, 10099.. . . .

First of all, the previous chapter is fixed, [1-9] is also fixed, and then [0-9] is qualified by the qualifier *, which means that it can appear zero or multiple times. Therefore, it can be empty, one bit, or multiple bits.

2./[a-zA-z]+://[^\s]*/

Matching URL

3./\d{3}- \d{8}|\d{4}-\{7,8}/

Match domestic phone number

4.\[1-9][0-9]{4 ,}\

Match Tencent QQ number

5.\[1-9]\d{5}(?!\d)\

Match domestic postal code

6.\^(\d{6})(\d{4})(\d{2})(\d{2})(\d{3})([0-9] |

8.\^-[1-9]\d*\

Match negative integers

9.\^-?[1-9]\d*$\

Match integers

10.\^[1-9]\d*\.\d*|0\.\d*[1-9]\d*$\

Match positive floating point numbers

11.\^-[1-9]\d*\.\d*|-0\.\d*[1-9]\d*$\

Match negative floating point numbers

Three, JavaScript RegExp object

1.

Create a regular expression object

:

new RegExp(

pattern

,

attributes

);

The pattern parameter refers to the regular expression, which is a string, and attributes is an optional parameter including g, i, m. Respectively refer to global matching, case-sensitive and multi-line matching.

2.Properties of the RegExP object:

global: Whether the RegExp object has the flag m; eg: if(patt1.global ) { alert("Global property is set"); };

ignoreCase: Whether the RegExp object has the flag m;

laseIndex: An integer, the character position where the flag starts the next match;

multiline: Whether the RegExp object has the flag m;

source: the original text of the regular expression.

3.Methods of RegExp object

:

compile

: Compile the regular expression.

compile() has two uses. The first is to compile regular expressions during script execution. The second is to change and recompile the regular expression;

eg: RegExpObject.compile(regexp,modifier);

The first parameter is the regular expression, and the second parameter specifies the match Type;

var str="Every man in the world! Every woman on earth!";

patt=/man/g;
str2=str.replace(patt,"person");
document.write(str2+"<br />");patt=/(wo)?man/g;
patt.compile(patt);//这里改变了正则表达式str2=str.replace(patt,"person");
document.write(str2);<p></p>
<p><strong>exec</strong>:检索字符串中指定的值,返回找到的值并确定其位置。</p>
<p>eg:RegExpObject.exec(<em>string</em>);//返回一个结果,用于存放匹配的数组,如果未找到匹配,则为null。</p>
<h3>说明:</h3>
<p>exec() 方法的功能非常强大,它是一个通用的方法,而且使用起来也比 test() 方法以及支持正则表达式的 String 对象的方法更为复杂。</p>
<p>如果 exec() 找到了匹配的文本,则返回一个结果数组。否则,返回 null。此数组的第 0 个元素是与正则表达式相匹配的文本,第 1 个元素是与 RegExpObject 的第 1 个子表达式相匹配的文本(如果有的话),第 2 个元素是与 RegExpObject 的第 2 个子表达式相匹配的文本(如果有的话),以此类推。除了数组元素和 length 属性之外,exec() 方法还返回两个属性。index 属性声明的是匹配文本的第一个字符的位置。input 属性则存放的是被检索的字符串 string。在调用非全局的 RegExp 对象的 exec() 方法时,返回的数组与调用方法 String.match() 返回的数组是相同的。</p>
<p>但是,当 RegExpObject 是一个全局正则表达式时,exec() 的行为就稍微复杂一些。它会在 RegExpObject 的 lastIndex 属性指定的字符处开始检索字符串 string。当 exec() 找到了与表达式相匹配的文本时,在匹配后,它将把 RegExpObject 的 lastIndex 属性设置为匹配文本的最后一个字符的下一个位置。这就是说,可以通过反复调用 exec() 方法来遍历字符串中的所有匹配文本。当 exec() 再也找不到匹配的文本时,它将返回 null,并把 lastIndex 属性重置为 0。</p>
<h2>提示和注释</h2>
<p class="important">重要事项:如果在一个字符串中完成了一次模式匹配之后要开始检索新的字符串,就必须手动地把 lastIndex 属性重置为 0。</p>
<p class="tip">提示:请注意,无论 RegExpObject 是否是全局模式,exec() 都会把完整的细节添加到它返回的数组中。这就是 exec() 与 String.match() 的不同之处,后者在全局模式下返回的信息要少得多。因此我们可以这么说,在循环中反复地调用 exec() 方法是唯一一种获得全局模式的完整模式匹配信息的方法。</p>
<div class="cnblogs_code"><pre class="brush:php;toolbar:false">var str = "正则表达式的exec方法测试"; 
var patt = new RegExp("exec","g");var result;while ((result = patt.exec(str)) != null)  {
          document.write('result:'+result);
            document.write("<br />");
            document.write('patt.lastIndex:'+patt.lastIndex);
          }

test:检索字符串中指定的值,返回true或false。

eg:var result = patt1.test(str);

4.支持正则表达式的string对象的方法

search:检索与正则表达式相匹配的值。

stringObj.search(regexp);//参数可以是子串,也可以是regexp对象。

注意:search()方法不执行全局匹配,它将忽略标志g,它同时忽略regexp的lastIndex属性,并且总是从字符串开始进行检索,所以他的返回值始终是sgringObj的第一个匹配的位置。如果要忽略大小写应追加i标记。

document.write(str.search(/abc/i);

match:找到一个或多个正则表达式的匹配。

stringObj.match(searchValue);//参数为要检索的字符串值

stringObj.match(regexp);//要匹配的模式的regexp对象。

返回存放匹配结果的数组。该数组的内容依赖于regexp是否具有全局属性g;

说明

match() 方法将检索字符串 stringObject,以找到一个或多个与 regexp 匹配的文本。这个方法的行为在很大程度上有赖于 regexp 是否具有标志 g。

如果 regexp 没有标志 g,那么 match() 方法就只能在 stringObject 中执行一次匹配。如果没有找到任何匹配的文本, match() 将返回 null。否则,它将返回一个数组,其中存放了与它找到的匹配文本有关的信息。该数组的第 0 个元素存放的是匹配文本,而其余的元素存放的是与正则表达式的子表达式匹配的文本。除了这些常规的数组元素之外,返回的数组还含有两个对象属性。index 属性声明的是匹配文本的起始字符在 stringObject 中的位置,input 属性声明的是对 stringObject 的引用。

如果 regexp 具有标志 g,则 match() 方法将执行全局检索,找到 stringObject 中的所有匹配子字符串。若没有找到任何匹配的子串,则返回 null。如果找到了一个或多个匹配子串,则返回一个数组。不过全局匹配返回的数组的内容与前者大不相同,它的数组元素中存放的是 stringObject 中所有的匹配子串,而且也没有 index 属性或 input 属性。

注意:在全局检索模式下,match() 即不提供与子表达式匹配的文本的信息,也不声明每个匹配子串的位置。如果您需要这些全局检索的信息,可以使用 RegExp.exec()。

eg:document.write(str.match(/\d+/g));

replace:替换与正则表达式匹配的子串。

stringObect.replace(regexp/substr,replacement);

regexp/substr:必须,正则表达式或者自字符串。

replacement):一个字符串值,规定了替换文本或生成替换文本的函数。

返回一个新的字符串,使用replacement替换第一次匹配或者所有匹配之后得到的。注意:指定g全局变量则替换所有的匹配,否则只替换第一次匹配到的字符串。

split:把字符串分割为字符串数组。

stringObect.split(separator,howmany);

separator:必须,字符串或正则表达式,从该参数指定的地方分割字符串;

howmany:可选,指定返回数组的最大长度,若设置了该参数,返回的子串不会多余这个参数指定的数组,如果没有设置该参数,整个字符串都会被分割。

返回一个字符串数组,不包括separator本身。

 

<br><br>

The above is the detailed content of Basic introduction to regular expressions and learning examples. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:From basics to proficiency-javascript languageNext article:From basics to proficiency-javascript language

Related articles

See more