Regular (regular), to use regular expressions you need to import the re (abbreviation of regular) module in Python. This article will introduce you to the relevant knowledge of regular expressions (regular) through this article. Friends who are interested can learn together
regular. To use regular expressions, you need to import re (abbreviation of regular) in Python. ) module. Regular expressions are used to process strings. We know that strings sometimes contain a lot of information that we want to extract. Mastering these methods of processing strings can facilitate many of our operations.
Regular expression (regular), a method of processing strings.
Regular expressions are a commonly used method, because file processing is very common in python. The files contain strings. If you want to process strings, you need to use regular expressions. Therefore, you must master regular expressions. Let’s take a look at the methods included in regular expressions:
(1) match(pattern, string, flags=0)
def match(pattern, string, flags=0): """Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.""" return _compile(pattern, flags).match(string)
Comment from above: Try to apply the pattern at the start of the string, returning a match object, or None if no match was found. Start searching from the beginning of the string and return a match object. If not found, return None.
Key points: (1) Start searching from the beginning; (2) Return None if not found.
Let’s take a look at a few examples:
import re string = "abcdef" m = re.match("abc",string) (1)匹配"abc",并查看返回的结果是什么 print(m) print(m.group()) n = re.match("abcf",string) print(n) (2)字符串不在列表中查找的情况 l = re.match("bcd",string) (3)字符串在列表中间查找情况 print(l)
The running results are as follows:
<_sre.SRE_Match object; span=(0, 3), match='abc'> (1)abc (2) None (3) None (4)
As can be seen from the above output result (1), using match() to match returns a match object. If you want to convert it to a visible situation, use group() to convert as shown in (2); if it matches The regular expression is not in the string, then None (3) is returned; match(pattern, string, flag) matches from the beginning of the string, and can only match from the beginning of the string as shown in (4) .
(2) fullmatch(pattern, string, flags=0)
def fullmatch(pattern, string, flags=0): """Try to apply the pattern to all of the string, returning a match object, or None if no match was found.""" return _compile(pattern, flags).fullmatch(string)
Comment from above :Try to apply the pattern to all of the string, returning a match object, or None if no match was found...
(3)search(pattern,string,flags)
def search(pattern, string, flags=0): """Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.""" return _compile(pattern, flags).search(string) search(pattern,string,flags)的注释是Scan throgh string looking for a match to the pattern,returning a match object,or None if no match was found.在字符串任意一个位置查找正则表达式,如果找到了则返回match object对象,如果查找不到则返回None。
Key points: (1) Search from any position in the middle of the string, unlike match() which searches from the beginning; (2) If it cannot be found, None is returned;
import re string = "ddafsadadfadfafdafdadfasfdafafda" m = re.search("a",string) (1)从中间开始匹配 print(m) print(m.group()) n = re.search("N",string) (2)匹配不到的情况 print(n)
The running results are as follows:
##
<_sre.SRE_Match object; span=(2, 3), match='a'> (1)a (2)None (3)From the above results (1) It can be seen that search(pattern, string, flag=0) can match from any position in the middle, which expands the scope of use. Unlike match(), which can only match from the beginning, and the match returned is also a match_object object; (2 ) If you want to display a match_object object, you need to use the group() method; (3) If it cannot be found, return None.
(4)sub(pattern,repl,string,count=0,flags=0)
def sub(pattern, repl, string, count=0, flags=0): """Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags).sub(repl, string, count) sub(pattern,repl,string,count=0,flags=0)查找替换,就是先查找pattern是否在字符串string中;repl是要把pattern匹配的对象,就要把正则表达式找到的字符替换为什么;count可以指定匹配个数,匹配多少个。示例如下: import re string = "ddafsadadfadfafdafdadfasfdafafda" m = re.sub("a","A",string) #不指定替换个数(1) print(m) n = re.sub("a","A",string,2) #指定替换个数(2) print(n) l = re.sub("F","B",string) #匹配不到的情况(3) print(l)The running results are as follows: ddAfsAdfAdfAfdAfdAdfAsfdAfAfdA --(1)
ddAfsAdadfadfafdafdadfasfdafafda -- (2)
ddafsadadfadfafdafdfasfdafafda --(3)
(5 )subn(pattern,repl,string,count=0,flags=0)
##
def subn(pattern, repl, string, count=0, flags=0): """Return a 2-tuple containing (new_string, number). new_string is the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in the source string by the replacement repl. number is the number of substitutions that were made. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags).subn(repl, string, count)
The above comment Return a 2-tuple containing(new_string, number): Returns a tuple used to store the new string after the regular match and the number of matches (new_string, number).
import re string = "ddafsadadfadfafdafdadfasfdafafda" m = re.subn("a","A",string) #全部替换的情况 (1) print(m) n = re.subn("a","A",string,3) #替换部分 (2) print(n) l = re.subn("F","A",string) #指定替换的字符串不存在 (3) print(l)
The running results are as follows:
('ddAfsAdAdfAdfAfdAfdAdfAsfdAfAfdA', 11) (1)
('ddAfsAdAdfadfafdafdadfasfdafafda', 3) (2) ('ddafsadadfadfafdafdadfasfdafafda', 0) (3)
从上面代码输出的结果可以看出,sub()和subn(pattern,repl,string,count=0,flags=0)可以看出,两者匹配的效果是一样的,只是返回的结果不同而已,sub()返回的还是一个字符串,而subn()返回的是一个元组,用于存放正则之后新的字符串,和替换的个数。
(6)split(pattern,string,maxsplit=0,flags=0)
def split(pattern, string, maxsplit=0, flags=0): """Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list.""" return _compile(pattern, flags).split(string, maxsplit) split(pattern,string,maxsplit=0,flags=0)是字符串的分割,按照某个正则要求pattern分割字符串,返回一个列表returning a list containing the resulting substrings.就是按照某种方式分割字符串,并把字符串放在一个列表中。实例如下: import re string = "ddafsadadfadfafdafdadfasfdafafda" m = re.split("a",string) #分割字符串(1) print(m) n = re.split("a",string,3) #指定分割次数 print(n) l = re.split("F",string) #分割字符串不存在列表中 print(l)
运行结果如下:
['dd', 'fs', 'd', 'df', 'df', 'fd', 'fd', 'df', 'sfd', 'f', 'fd', ''] (1) ['dd', 'fs', 'd', 'dfadfafdafdadfasfdafafda'] (2) ['ddafsadadfadfafdafdadfasfdafafda'] (3)
从(1)处可以看出,如果字符串开头或者结尾包括要分割的字符串,后面元素会是一个"";(2)处我们可以指定要分割的次数;(3)处如果要分割的字符串不存在列表中,则把原字符串放在列表中。
(7)findall(pattern,string,flags=)
def findall(pattern, string, flags=0): """Return a list of all non-overlapping matches in the string. If one or more capturing groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.""" return _compile(pattern, flags).findall(string) findall(pattern,string,flags=)是返回一个列表,包含所有匹配的元素。存放在一个列表中。示例如下: import re string = "dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda" m = re.findall("[a-z]",string) #匹配字母,匹配所有的字母,返回一个列表(1) print(m) n = re.findall("[0-9]",string) #匹配所有的数字,返回一个列表 (2) print(n) l = re.findall("[ABC]",string) #匹配不到的情况 (3) print(l)
运行结果如下:
['d', 'd', 'a', 'd', 'f', 'a', 'd', 'f', 'a', 'f', 'd', 'a', 'f', 'd', 'a', 'd', 'f', 'a', 's', 'f', 'd', 'a', 'f', 'a', 'f', 'd', 'a'] (1) ['1', '2', '3', '2', '4', '6', '4', '6', '5', '1', '6', '4', '8', '1', '5', '6', '4', '1', '2', '7', '1', '1', '3', '0', '0', '2', '5', '8'] (2) [] (3)
上面代码运行结果(1)处匹配了所有的字符串,单个匹配;(2)处匹配了字符串中的数字,返回到一个列表中;(3)处匹配不存在的情况,返回一个空列表。
重点:(1)匹配不到的时候返回一个空的列表;(2)如果没有指定匹配次数,则只单个匹配。
(8)finditer(pattern,string,flags=0)
def finditer(pattern, string, flags=0): """Return an iterator over all non-overlapping matches in the string. For each match, the iterator returns a match object. Empty matches are included in the result.""" return _compile(pattern, flags).finditer(string) finditer(pattern,string)查找模式,Return an iterator over all non-overlapping matches in the string.For each match,the iterator a match object.
代码如下:
import re string = "dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda" m = re.finditer("[a-z]",string) print(m) n = re.finditer("AB",string) print(n)
运行结果如下:
<callable_iterator object at 0x7fa126441898> (1) <callable_iterator object at 0x7fa124d6b710> (2)
从上面运行结果可以看出,finditer(pattern,string,flags=0)返回的是一个iterator对象。
(9)compile(pattern,flags=0)
def compile(pattern, flags=0): "Compile a regular expression pattern, returning a pattern object." return _compile(pattern, flags)
(10)pruge()
def purge(): "Clear the regular expression caches" _cache.clear() _cache_repl.clear()
(11)template(pattern,flags=0)
def template(pattern, flags=0): "Compile a template pattern, returning a pattern object" return _compile(pattern, flags|T)
正则表达式:
语法:
import re string = "dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda" p = re.compile("[a-z]+") #先使用compile(pattern)进行编译 m = p.match(string) #然后进行匹配 print(m.group())
上面的第2 和第3行也可以合并成一行来写:
m = p.match("^[0-9]",'14534Abc')
效果是一样的,区别在于,第一种方式是提前对要匹配的格式进行了编译(对匹配公式进行解析),这样再去匹配的时候就不用在编译匹配的格式,第2种简写是每次匹配的时候都要进行一次匹配公式的编译,所以,如果你需要从一个5w行的文件中匹配出所有以数字开头的行,建议先把正则公式进行编译再匹配,这样速度会快点。
匹配的格式:
(1)^ 匹配字符串的开头
import re string = "dd12a32d41648f27fd11a0sfdda" #^匹配字符串的开头,现在我们使用search()来匹配以数字开始的 m = re.search("^[0-9]",string) #匹配字符串开头以数字开始 (1) print(m) n = re.search("^[a-z]+",string) #匹配字符串开头以字母开始,如果是从开头匹配,就与search()没有太多的区别了 (2) print(n.group())
运行结果如下:
None
dd
在上面(1)处我们使用^从字符串开头开始匹配,匹配开始是否是数字,由于字符串前面是字母,不是数字,所以匹配失败,返回None;(2)处我们以字母开始匹配,由于开头是字母,匹配正确,返回正确的结果;这样看,其实^类似于match()从开头开始匹配。
(2)$ 匹配字符串的末尾
import re string = "15111252598" #^匹配字符串的开头,现在我们使用search()来匹配以数字开始的 m = re.match("^[0-9]{11}$",string) print(m.group())
运行结果如下:
15111252598
re.match("^[0-9]{11}$",string)含义是匹配以数字开头,长度为11,结尾为数字的格式;
(3)点(·) 匹配任意字符,除了换行符。当re.DoTALL标记被指定时,则可以匹配包括换行符的任意字符
import re string = "1511\n1252598" #点(·)是匹配除了换行符以外所有的字符 m = re.match(".",string) #点(·)是匹配任意字符,没有指定个数就匹配单个 (1) print(m.group()) n = re.match(".+",string) #.+是匹配多个任意字符,除了换行符 (2) print(n.group())
运行结果如下:
1
1511
从上面代码运行结果可以看出,(1)处点(·)是匹配任意字符;(2)处我们匹配任意多个字符,但是由于字符串中间包含了空格,结果就只匹配了字符串中换行符前面的内容,后面的内容没有匹配。
重点:(1)点(·)匹配除了换行符之外任意字符;(2).+可以匹配多个任意除了换行符的字符。
(4)[...] 如[abc]匹配"a","b"或"c"
[object]匹配括号中的包含的字符。[A-Za-z0-9]表示匹配A-Z或a-z或0-9。
import re string = "1511\n125dadfadf2598" #[]匹配包含括号中的字符 m = re.findall("[5fd]",string) #匹配字符串中的5,f,d print(m)
运行结果如下:
['5', '5', 'd', 'd', 'f', 'd', 'f', '5']
上面代码,我们是要匹配字符串中的5,f,d并返回一个列表。
(5)[^...] [^abc]匹配除了abc之外的任意字符
import re string = "1511\n125dadfadf2598" #[^]匹配包含括号中的字符 m = re.findall("[^5fd]",string) #匹配字符串除5,f,d之外的字符 print(m)
运行如下:
['1', '1', '1', '\n', '1', '2', 'a', 'a', '2', '9', '8']
上面代码,我们匹配除了5,f,d之外的字符,[^]是匹配非中括号内字符之外的字符。
(6)* 匹配0个或多个的表达式
import re string = "1511\n125dadfadf2598" #*是匹配0个或多个的表达式 m = re.findall("\d*",string) #匹配0个或多个数字 print(m)
运行结果如下:
['1511', '', '125', '', '', '', '', '', '', '', '2598', '']
从上面运行结果可以看出(*)是匹配0个或多个字符的表达式,我们匹配的是0个或多个数字,可以看出,如果匹配不到返回的是空,并且最后位置哪里返回的是一个空("")。
(7)+ 匹配1个或多个的表达式
import re string = "1511\n125dadfadf2598" #(+)是匹配1个或多个的表达式 m = re.findall("\d+",string) #匹配1个或多个数字 print(m)
运行如下:
['1511', '125', '2598']
加(+)是匹配1个或多个表达式,上面\d+是匹配1个或多个数字表达式,至少匹配一个数字。
(8)? 匹配0个或1个的表达式,非贪婪方式
import re string = "1511\n125dadfadf2598" #(?)是匹配0个或1个的表达式 m = re.findall("\d?",string) #匹配0个或1个的表达式 print(m)
运行结果如下:
['1', '5', '1', '1', '', '1', '2', '5', '', '', '', '', '', '', '', '2', '5', '9', '8', '']
上面问号(?)是匹配0个或1个表达式,上面是匹配0个或1个的表达式,如果匹配不到则返回空("")
(9){n} 匹配n次,定义一个字符串匹配的次数
(10){n,m} 匹配n到m次表达式
(11)\w 匹配字母数字
\w是匹配字符串中的字母和数字,代码如下:
import re string = "1511\n125dadfadf2598" #(?)是匹配0个或1个的表达式 m = re.findall("\w",string) #匹配0个或1个的表达式 print(m)
运行如下:
['1', '5', '1', '1', '1', '2', '5', 'd', 'a', 'd', 'f', 'a', 'd', 'f', '2', '5', '9', '8']
从上面代码可以看出,\w是用来匹配字符串中的字母数字的。我们使用正则匹配字母和数字。
(12)\W \W大写的W是用来匹配非字母和数字的,与小写w正好相反
实例如下:
import re string = "1511\n125dadfadf2598" #\W用来匹配字符串中的非字母和数字 m = re.findall("\W",string) #\W用来匹配字符串中的非字母和数字 print(m)
运行如下:
['\n']
上面代码中,\W是用来匹配非字母和数字的,结果把换行符匹配出来了。
(13)\s 匹配任意空白字符,等价于[\n\t\f]
实例如下:
import re string = "1511\n125d\ta\rdf\fadf2598" #\s是用来匹配字符串中的任意空白字符,等价于[\n\t\r\f] m = re.findall("\s",string) #\s用来匹配字符串中任意空白字符 print(m)
运行如下:
['\n', '\t', '\r', '\x0c']
从上面代码运行结果可以看出:\s是用来匹配任意空的字符,我们把空的字符匹配出来了
(14)\S 匹配任意非空字符
实例如下:
import re string = "1511\n125d\ta\rdf\fadf2598" #\S是用来匹配任意非空字符 m = re.findall("\S",string) #\S用来匹配日任意非空字符 print(m)
运行如下:
['1', '5', '1', '1', '1', '2', '5', 'd', 'a', 'd', 'f', 'a', 'd', 'f', '2', '5', '9', '8']
As can be seen from the above code, \S is used to match any non-empty character. In the result, we matched any non-empty character.
(15)\d Matches any number, equivalent to [0-9]
(16)\D Matches any non-number
Summary: findall(), split() generates lists, one with a certain delimiter, and the other with all the values in the search. exactly the opposite.
The above is the detailed content of Regular expression knowledge (organized). For more information, please follow other related articles on the PHP Chinese website!

两种去除方法:1、利用preg_replace()执行正则表达式搜索所有大写字母并将其替换为空字符即可,语法“preg_replace('/[A-Z]/','',$str)”。2、利用preg_filter()执行正则表达式搜索所有大写字母并将其替换为空字符即可,语法“preg_filter('/[A-Z]/','',$str)”。

方法:1、用“str_replace(" ","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\ \;||\xc2\xa0)/","其他字符",$str)”语句。

随着数据量的不断增大,正则表达式匹配成为了程序中常用的操作之一。而在Go语言中,由于其天然的并行ism,以及与底层系统的交互性和高效性,使得Go语言的正则表达式匹配极具优势。那么如何使用Go语言编写高性能的正则表达式匹配呢?一、了解正则表达式在使用正则表达式前,我们首先需要了解正则表达式,了解其基本语法规则以及常用的匹配字符,使我们能够在编写正则表达式时更加

两种方法:1、用preg_replace(),可执行正则表达式的搜索和替换,只需将字符串中匹配的字符替换为空字符即可,语法“preg_replace(正则, "", $str)”。2、用preg_match_all(),可搜索字符串中所有和正则表达式匹配的结果,会将每次的匹配结果放在一个数组$array中,语法“preg_match_all(正则,$str,$array);”。

在javascript中,可以使用replace()函数配合正则表达式“/[u4e00-u9fa5|,]+/ig”来查找字符串中的所有非汉字字符,并将其替换为其他指定值,语法“字符串对象.replace(/[u4e00-u9fa5|,]+/ig,'指定替换值')”。

php中可用preg_match_all()配合正则表达式过滤字符串,只获取中文字符;语法“preg_match_all("/[\x{4e00}-\x{9fff}]+/u","$str",$arr);”,会将匹配字符存入“$arr”数组中。

Java语言正则表达式的使用方法正则表达式是一种强大的文本处理工具,可以用来匹配和验证文本。在Java语言中,也可以使用正则表达式来实现字符串的匹配和处理。本文将介绍Java语言正则表达式的使用方法,涵盖正则表达式的基础知识,常用的正则表达式语法,以及在Java程序中使用正则表达式的方法。一、基础知识正则表达式是什么?正则表达式是一种文本模式,用来描述一组字

在PHP开发中,正则表达式是非常重要的工具,用于匹配、查找和替换文本中的特定字符串。然而,编写高效的正则表达式并不是一件易事,需要开发者具备一定的技巧和经验。下面是一些可以帮助您编写高效正则表达式的技巧:1.尽可能使用非贪婪匹配默认情况下,正则表达式是贪婪的,即它们将尽可能匹配更多的文本。在某些情况下,可能需要使用非贪婪匹配来避免这种情况。非贪婪匹配使用"


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

SublimeText3 Linux new version
SublimeText3 Linux latest version

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version
Useful JavaScript development tools

SublimeText3 English version
Recommended: Win version, supports code prompts!
