Home >Web Front-end >JS Tutorial >JavaScript learning summary (7) JS RegExp_javascript skills

JavaScript learning summary (7) JS RegExp_javascript skills

WBOY
WBOYOriginal
2016-05-16 15:29:101189browse

In js, a regular expression is represented by a RegExp object. RegExp is the abbreviation of regular expression. RegExp simple pattern can be a single character. More complex patterns include more characters and can be used for parsing, format checking, substitution, etc. You can use a RegExp() constructor to create a RegExp object, or you can use literal syntax.

1. Introduction to RegExp.

RegExp is a regular expression (regular expression, often abbreviated as regex, regexp or RE/re/reg in code), which uses a single string to describe and match a series of string search patterns that conform to a certain syntax rule. Search mode is available for text search and text replacement.

Regular expression is a search pattern formed by a sequence of characters. When you search for data in text, you can use the search pattern to describe the content you want to query. That is to say, the regular expression is an object that describes the character pattern. , which can be used to match string patterns and search and replace. It is a powerful tool for performing pattern matching on strings.

The so-called regular expression can be directly understood as a rule or pattern expression, which expresses a certain rule that a computer can understand, but is a text expression that is difficult for ordinary people to understand. It can be used for all text searches and text replacements. Operation, simply put, is to process strings.

2. String method.

 (1), charAt()  Get a certain character and return a certain character in the string.

 (2), split()  Split the string and obtain an array.

 (3), search()Find the position where a certain character first appears. It is better to use it with regular expressions. The return value is a numerical value. If it is not found, -1 is returned.

 (4), match()Find the specified character in the string and return the character. If you do not use regular expressions, only the first occurrence of the specified character will be returned, and no backward matching will be performed. If you use regular expressions and perform global matching, Returns all the specified characters in the string in array form, or returns null if not found.

 (5), replace() Replace characters and return a new string. It is better used with regular expressions and can replace all matches.

 <script>
 var str='abcdefgca';
 //返回字符串中某一位的字符。
 alert(str.charAt());  //返回:d
 //查找字符串首次出现的位置。
 alert(str.search('z'));  //返回:-
 //查找指定的字符。
 //只返回第一次出现的c,不再向后匹配。
 alert(str.match('c'));  //返回:c
 //将'a'替换为'i'。
 //只替换了第一个位置的a,不再向后匹配。
 alert(str.replace('a', 'i'));  //返回:ibcdefgca
 //分割字符串。
 var str='--aaa--cd';
 var arr=str.split('-');  //返回:,,aaa,,cd
 alert(arr);
 </script>

Example: Use ordinary methods to find all numbers in a string

Implementation idea: It is not difficult to find the numbers in the string. Use judgment to extract the numbers in the string first. Then there must be more than one number in the string, so an empty string is needed to store it. Extract the numeric characters, then add these numeric characters to the array, and finally return, and you are done. Let’s see how the program is implemented:

 <script>
 var str=' abc d aa c zz -=-=s-';
 var arr=[];
 var num='';
 //首先循环遍历字符串
 for(var i=;i<str.length;i++){
   //再判断当前字符大于等于并且小于等于,则为数字
   if(str.charAt(i)>='' && str.charAt(i)<=''){
     //那么就将当前的字符存储在空字符串中
    num += str.charAt(i);
   }
   else{
     //如果字符串中有值。
     if(num){
       //将值添加到数组中。
       arr.push(num);
       //再清空字符串,避免重复添加。
       num='';
     }
   }
 }
 //最后在整个字符串结束之后有可能还会有数字,再做一次判断。
 if(num){
   //如果还有值就添加到数组中。
   arr.push(num);
   //再清空字符串。
   num='';
 }
 //返回:OK,现在返回就完成了。
 alert(arr); //返回:,,,,,,,
 </script>

Although it can be completed using ordinary methods and has a clear structure, the code is relatively long. If you use regular expressions, then one expression can complete so much work, which is very convenient. Let’s take a look at how to use regular expressions.

3. Use regular rules.

Regular expression syntax: var re = new RegExp('pattern', 'modifier');

The pattern is the pattern of the expression, and the modifier is used to specify global matching, case insensitivity, etc. The complete form is a regular expression.

Seeing that the regular syntax looks like this, isn’t it the typical syntax for newly created objects in JS? By the way, it means creating a new regular object. We all know that we should try to avoid using the new keyword. Using new will undoubtedly create a new object, which also means that it occupies a certain amount of memory space. If not handled properly, excessive accumulation will cause memory overflow, which is quite expensive. resources, which is not conducive to the implementation of code optimization. At the same time, this way of writing does not reflect the power of regularity. It should be very concise. Therefore, in actual use, this JS-style regular syntax is not used. Instead, another style is used, as follows :

Syntax: var re = /mode/modifier;

This style is relatively concise and is a way of expression that ordinary people cannot understand.

 (1), modifier.

Modifier is used to perform global matching and case sensitivity.

Ignore case: i (abbreviation of ignore, Chinese translation is: ignore)

Global matching: g (abbreviation of global, Chinese translation: all/global)

Example: Global search for specified characters

 <script>
 var str='AbCdEFgiX';
 //JS风格:
 //这个正则表达式什么也不代表,只代表abc本身。
 var reg=new RegExp('abc', 'i');
 alert(str.match(reg));  //返回:AbC
 //常用风格:
 var re=/efg/i;
 alert(str.match(re));  //返回:EFg
 </script>

4. Square brackets and metacharacters.

 (1), square brackets.

Square brackets are used to find characters within a certain range.
①、Any character
Expression: [abc]
​Find any character within square brackets.
[] here means or, that is, whichever one appears will do.

<script>
 var str='apc xpc ppc bpc spc opc';
 //[apx]pc,随便出现哪个都行,即:apc ppc xpc
 var re=/[apx]pc/g;
 alert(str.match(re));  //返回前个pc。
 </script>

   ②、范围查找。

  表达式:[0-9] [a-z] [A-z] [A-Z]

  [0-9]  查找任意 0 - 9 的数字。

  [a-z]  查找任意 a - z 的字符。

  [A-z]  查找任意 大写A - 小写z 的字符。

  [A-Z]  查找任意 大写A - 大写Z的字符。

  ③、排除查找。

  表达式:[^abc] [^a-z] [^0-9]

  [^abc]  查找任意不在方括号中的字符。

  [^a-z]  查找任意除了字母以外的字符,包括数字符号中文外文。

  [^0-9]  查找任意除了数字以外的字符,包括字母符号中文外文。

 <script>
 var str='ot out o.t o t o`t ot ot';
 //o和t中间除了数字,什么都可以
 var re=/o[^-]t/g;
 alert(str.match(re));  //返回:out,o.t,o t,o`t
 </script>

 ④. Select search.

Expression: (a|b|c)

Find any specified option, a or b or c.

 ⑤. Matching can also use combination mode, such as: [a-z0-9A-Z] [^a-z0-9]

[a-z0-9A-Z] Any uppercase or lowercase letters and numbers.

 [^a-z0-9] Anything except letters and numbers is acceptable.

  (2) Metacharacters.

Metacharacters are characters with special meanings and can also be called escape characters.

The following are some commonly used metacharacters:

Metacharacters Description Use
. Finds a single character, representing any character, excluding newlines and line terminators. It is not recommended to use, it is prone to problems.
w Find word characters, including English numbers and underscores, equivalent to [a-z0-9] /w/
W Find non-word characters, equivalent to [^a-z0-9] /W/
d Find a number, equivalent to [0-9] /d/
D Find non-digits, equivalent to [^0-9] /D/
s Find whitespace characters, including spaces, carriage returns, tabs, line feeds and form feeds. Unprintable characters cannot be displayed. /s/
S Find non-whitespace characters. /S/
b Finds a match at the beginning or end of the word, or returns null if no match is found. /b/
B

Find matches that are not word boundaries, that is, they are not at the beginning or end. The type of the previous and next characters in the matching position are the same: that is, they must both be words,

or must both be non-words, at the beginning and end of the string are considered non-word characters, if not matched, null is returned.

/B/
n Looks for a newline character and returns that position if found, or -1 if not found. /n/
f Find page breaks. /f/
r Look for the carriage return character. /r/
t Find tab characters.

5. Quantifier.

The so-called quantifier is a quantifier, that is, a number. It is used in regular expressions to express the number of occurrences.

The following are some commonly used quantifiers:

Quantifier Description Use
* zero or any number of times, equivalent to {0,} Not recommended, the range is too broad and not precise enough.
? zero or one time, equivalent to {0, 1} /10?/g Performs a global search for 1, including zero or 1 '0' immediately following it.
once or any number of times, equivalent to {1, } /w /g Perform a global search for at least one word.
{n} Exactly n times /d{4}/g performs a global search for four-digit numbers.
{n,} At least n times, no limit at most /d{3,}/g Performs a global search for numbers containing at least three digits.
{n, m} At least n times, at most m times /d{3,4}/g Perform a global search for numbers containing three or four digits.

The following are some commonly used matching patterns:

Mode Description Use
^a Any character starting with a means the beginning of the line /^d/ Starts with a number /^a/g Global search for characters starting with 'a'
a$ Any character ending in a, indicating the end of the line /d$/ Ending with a number /z$/g Global search for characters ending with 'z'
?=a Any character followed by a /a(?= b)/g Global search for 'a' followed by 'b' characters
?!a Any character not followed by a /c(?= d)/g Global search for characters that do not follow 'd' after 'c'

6、字符串和正则配合。

   (1)、search()配合正则

  实例:找出字符串中第一次出现数字的位置

 <script>
 var str='asdf  zxcvbnm';
 //元字符d,表示转义为数字
 var re=/\d/;
 alert(str.search(re));  //返回: 第一个数字为出现在第位
 </script>

   (2)、match()配合正则

  其实没有什么东西是非正则不可的,只是正则可以让做东西更方便。下面就完成本章遗留的历史问题,怎么使用正则,能一句代码就完成普通方法需要很多行代码才能完成的东西。

  在实例之前,先看看match()与正则的配合。

<script>
 var str='asdf  zxcvm';
 //找出字符串中的数字可以使用元字符\d
 var re=/\d/;
 //没告诉系统要找多少数字,系统在找到数字后就返回
 alert(str.match(re));  //返回:
 //因此需要全局匹配,使用修饰符g
 var re=/\d/g;
 //没告诉系统要找几位,系统会将所有找到的数字返回
 alert(str.match(re));  //返回:,,,,,,,,,,,,
 //所以可以使用两个元字符,告诉系统要找的数字是位
 var re=/\d\d/g;
 //显然这样是不可取的,因为数字的位数并不固定,可能是位,有可能还是多位
 alert(str.match(re));  //返回:,,,,
 //所以需要用到量词+,+代表若干,也就是多少都可以。
 var re=/\d+/g;
 //现在返回正确。
 alert(str.match(re));  //返回:,,,,
 </script>

   实例:使用正则找出字符串中所有数字

 <script>
 var str=' abc d aa c zz -=-=s-';
 //alert(str.match(/\d+/g));
 //元字符d也可以使用[-]代替,到随便哪个都行。
 alert(str.match(/[-]+/g));  //返回:,,,,,,,
 </script>

  正则是强大的字符串匹配工具,就这样简单的使用一句代码就完成了。

  (3)、replace()配合正则

 <script>
 var str='abc zaaz deaxcaa';
 //将字符串中的a替换为数字
 alert(str.replace('a', ));  //仅仅只将第一个a替换为
 //配合正则使用匹配所有a再替换
 var re=/a/g;
 alert(str.replace(re, ''));  //返回所有的a都为
 </script>

   实例:简单的敏感词过滤

  所谓的敏感词,就是法律不允许的词语,一切非法词都可以叫做敏感词,这包括的范围就太广了,比如危害国家安全,反对宪法确立的基本原则,散步谣言,扰乱民心,扰乱社会秩序,破坏社会稳定,色情、暴力、赌博、虚假、侵害、骚扰、粗俗、猥亵或其他道德上令人反感的词,以及含有法律规定或禁止的其他内容的词语。在平时最常见也是大多数人都会用的词莫属道德上令人反感的词了,说斯文一点就是吵架时用于攻击别人的词语。这里就列举几个热门的网络词语作为例子。

 <!DOCTYPE html>
 <html>
 <head>
   <meta charset="UTF-">
   <title>JavaScript实例</title>
 <script>
 window.onload=function (){
   var oBtn=document.getElementById('btn');
   var oTxt=document.getElementById('txt');
   var oTxt=document.getElementById('txt');
   oBtn.onclick=function (){
     //这里的|在正则中表示 或 的意思
     var re=/元芳|萌萌哒|然并卵|毛线|二货|城会玩/g;
     //文本框的值等于文本框的值过滤掉敏感词
     oTxt.value=oTxt.value.replace(re,'***');
   };
 };
 </script>
 </head>
 <body>
 <textarea id="txt" rows="" cols=""></textarea><br>
 <input id="btn" type="button" value="过滤"><br>
 <textarea id="txt" rows="" cols=""></textarea>
 </body>
 </html>

  可在第一个文本框中输入一些相关语句,点击过滤按钮,查看过滤后的效果。

  此外,支持正则表达式的 String 对象的方法还包括 split() 方法,可把字符串分割为字符串数组。

7、RegExp对象方法。

  在JS中,RegExp对象是一个预定义了属性和方法的正则表达式对象。

  (1)、test()

  test() 方法用于检测一个字符串是否匹配某个模式,也就是检测指定字符串是否含有某个子串,如果字符串中含有匹配的文本,返回 true,否则返回 false。

  语法:RegExpObject.test(str)

  调用 RegExp 对象 re 的 test() 方法,并为它传递字符串str,与这个表示式是等价的:(re.exec(str) != null)。

  实例:搜索字符串是否含有指定的字符

 <script>
 var str='The best things in life are free, like hugs, smiles, friends, kisses, family, love and good memories.';
 var re=/i/;
 alert(re.test(str));  //返回:true
 var reg=/z/;
 alert(reg.test(str));  //返回:false
 //上面的代码可以不用定义正则的变量,直接使用,将两行合并为一行。
 alert(/i/.test(str));
 alert(/z/.test(str));
 </script>

  (2)、exec()

  exec() 方法用于检索字符串中的正则表达式的匹配,提取指定字符串中符合要求的子串,该方法返回一个数组,其中存放匹配的结果。如果未找到匹配,则返回 null。可以使用循环提取所有或者指定index的数据。

  语法:RegExpObject.exec(str)

  exec() 方法的功能非常强大,它是一个通用的方法,可以说是test() 方法的升级版,因为他不仅可以检测,而且检测到了可以直接提取结果。该方法使用起来比 test() 方法以及支持正则表达式的 String 对象的方法更为复杂。

 <script>
 var str = 'good good study day day up';
 var re = /good/;
 var arr = re.exec(str);
 console.log(arr);  //控制台显示:["good"]点开后显示: "good",index ,input "good good study day day up"。
 console.log(arr.index);  //控制台显示:
 console.log(arr.input);  //控制台显示:good good study day day up
 </script>

  通过上面的实例,可以看到,如果 exec() 找到了匹配的文本,则返回一个结果数组。否则,返回 null。此数组的第 0 个元素是与正则表达式相匹配的文本,第 1 个元素是与 RegExpObject 的第 1 个子表达式相匹配的文本(如果有的话),第 2 个元素是与 RegExpObject 的第 2 个子表达式相匹配的文本(如果有的话),以此类推。

  除了数组元素和 length 属性之外,exec() 方法还返回两个属性。index 属性声明的是匹配文本的第一个字符的位置。input 属性则存放的是被检索的字符串 string。我们可以看出,在调用非全局的 RegExp 对象的 exec() 方法时,返回的数组与调用方法 String.match() 返回的数组是相同的。 

  什么是"与子表达式相匹配的文本"?

  所谓的子表达式就是正则表达式中包含在圆括号中的内容。看下面实例:

 <script>
 var str = 'good good study day day up';
 var re = /g(o+)d/;
 var arr = re.exec(str);
 console.log(arr);  //显示:["good", "oo"]点开后显示: "good", "oo", index ,input: "good good study day day up"
 console.log(arr.length); //显示:
 var reg = /(o+)/;
 //var reg = /o+/;  只返回一个"oo",长度为
 var arr = reg.exec(str);
 console.log(arr);  //显示:["oo", "oo"]点开后显示: "oo", "oo", index ,input: "good good study day day up"
 console.log(arr.length); //显示:
 </script>

   通过上例,可以看到,子表达式是一个大的表达式的一部分,并且必须用()包含起来。一个表达式可使用多个子表达式,同时还支持多层嵌套,把一个表达式划分为多个子表达式的目的是为了把那些子表达式当作一个独立的元素来使用。也就是说表达式中的子表达式可以作为整个表达式返回,也可以作为一个单独的表达式返回。所以上面的数组长度为 2。

  使用子表达式是为了提取匹配的子字符串,表达式中有几个()就有几个相应的匹配字符串,顺序会依照()出现的顺序依次进行,并且()中可以使用 或"|" 进行多个选择。也就是说可以使用()对字符进行分组,并保存匹配的文本。

  如果该方法使用全局匹配,则找到第一个指定字符,并存储其位置,如果再次运行 exec(),则从存储的位置(lastIndex)开始检索,并找到下一个指定字符,存储其位置。lastIndex属性是RegExp对象属性,是一个整数,标示开始下一次匹配的字符位置。看下面实例:

 <script>
 var str = 'good good study day day up';
 var re = /good/g;
 var arr;
 do{
   arr = re.exec(str);
   console.log(arr);
   console.log(re.lastIndex);
 }
 while(arr !== null)
 /*
 结果如下:
 显示:["good"],点开后: "good", index , input "good good study day day up"。
 lastIndex为。
 显示:["good"],点开后: "good", index , input "good good study day day up"。
 lastIndex为。
 
 显示:null 
 lastIndex为。
 */
 </script>

   在调用非全局的 RegExp 对象的 exec() 方法时,返回的数组与调用 String.match() 返回的数组是相同的。但是,当 RegExpObject 是一个全局正则表达式时,exec() 的行为就稍微复杂一些。它会在 RegExpObject 的 lastIndex 属性指定的字符处开始检索字符串 string。当 exec() 找到了与表达式相匹配的文本时,在匹配后,它将把 RegExpObject 的 lastIndex 属性设置为匹配文本的最后一个字符的下一个位置。这就是说,可以通过反复调用 exec() 方法来遍历字符串中的所有匹配文本。当 exec() 再也找不到匹配的文本时,它将返回 null,并把 lastIndex 属性重置为 0。

  通过上面实例,可以看到,当第三次循环时,找不到指定的 "good",于是返回null,lastIndex值也变成0了。找到的第一个"good"的lastIndex值为4,是匹配文本最后一个字符的下一个位置。

<script>
 var str = 'good good study day day up';
 var re = /good/g;
 var arr;
 while((arr = re.exec(str)) != null){
   console.log(arr);
   console.log(re.lastIndex);
 }
 /*
 结果如下:
 显示:["good"],点开后: "good", index , input "good good study day day up"。
 lastIndex为。
 
 显示:["good"],点开后: "good", index , input "good good study day day up"。
 lastIndex为。
 */
 </script>

  这里需要注意,如果在一个字符串中完成了一次模式匹配之后要开始检索新的字符串(仍然使用旧的re),就必须手动地把 lastIndex 属性重置为 0。

  无论 RegExpObject 是否是全局模式,exec() 都会把完整的细节添加到它返回的数组中。这就是 exec() 与 String.match() 的不同之处,后者在全局模式下返回的信息要少得多。因此可以这么认为,在循环中反复地调用 exec() 方法是唯一一种获得全局模式的完整模式匹配信息的方法。

  (3)、compile

  compile() 方法用于在脚本执行过程中编译正则表达式,也可用于改变和重新编译正则表达式。主要作用是改变当前(re)匹配模式。

  语法:RegExpObject.compile(模式, 修饰符)

  模式就是正则表达式,修饰符用于规定匹配的类型,g匹配全局,i忽略大小写,gi全局匹配忽略大小写。

  该方法是改变匹配模式时使用的,一般情况下,能用到的地方很少。

  实例:在全局中忽略大小写 搜索"day",并用 "天" 替换,然后通过compile()方法,改变正则表达式,用 "日" 替换 "Today" 或 "day"。

 <script>
 var str = 'Today is a beautiful day, Day day happy!';
 var re = /day/gi;
 var str = str.replace(re, '天');
 console.log(str);  //输出:To天 is a beautiful 天, 天 天 happy!
 reg = /(to)&#63;day/gi;
 reg.compile(reg); 
 str = str.replace(reg, '日');
 console.log(str);  //输出:日 is a beautiful 日, 日 日 happy!
 </script>

8. Regular application.

Regular expressions are also called regular expressions, so when writing, the process is the same as writing JS, think first and then write. The most important thing is to understand the rules he wants to express. First, carefully look at its appearance to see what it looks like, that is, what format it exists in. Then write the expression according to this format to see if it can Achieve our expected purpose. If not, in fact, generally describing the format directly as an expression will not achieve the expected effect. Fortunately, our main framework already exists, and we only need to know what went wrong. If something does not meet expectations, you can simply make slight modifications based on this framework, and finally it will be a perfect expression. For example, if you want to write a regular expression to verify a mobile phone number, everyone knows that the mobile phone number is 11 digits, all numbers, and the beginning is 1, followed by 2 digits. Because of different operators, there can be many combinations, followed by 8 The digits are any numbers, so we can stipulate that the beginning must be 1, the next 2 digits are limited according to different combinations provided by each operator, and finally enter 8 digits of any number. In this way, even if the main framework is completed, there are also special circumstances for mobile phone numbers. For example, adding 86 in front of the mobile phone number can still be used. When necessary, we need to take this situation into consideration, otherwise the user has to enter his or her mobile phone number. Add 86 in front, then click Mention, and the system will pop up a window. What you entered is wool, and the system does not recognize it. This is a joke, so you only need to modify the framework and consider this situation. Go in and you're done.

So it seems that regular expressions are very simple, but in fact they are quite difficult. Why are you so diao? In the final analysis, it is this kind of expression that is difficult for ordinary people to understand. When I wrote it, I knew exactly what it was expressing. After a while, I looked back and thought, Oh my god, why don’t I recognize it? In fact, this is a process where practice makes perfect. I remember there was a text that said, "A good memory is not as good as a bad pen."

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn