Home >Backend Development >Python Tutorial >Regular expression (regular)

Regular expression (regular)

大家讲道理
大家讲道理Original
2017-05-28 09:57:342036browse

Regular (regular), To use regular expressions, you need to import the re (abbreviation of regular) module in Python. Regular expressions are used to process strings. We know that strings sometimes contain a lot of information that we want to extract. Mastering these methods of processing strings can facilitate many of our operations.

Regular expression (regular), a method of processing strings. http://www.cnblogs.com/alex3714/articles/5169958.html

Regular expression is a commonly used method, because file processing is very common in Python, and the file contains Strings. If you want to process strings, you need to use regular expressions. Therefore, you must master regular expressions. Let’s take a look at the methods included in regular expressions:

(1) match(pattern, string, flags=0)

def match(pattern, string, flags=0):
   """Try to apply the pattern at the start of the string, returning
                  a match object, or None if no match was found."""
                                                                                             

# Comment from above: Try to apply the pattern at the start of the string, returning a match object, or None if no match was found. Start searching from the beginning of the string and return a match object. If not found, return a None.

Key points: (1) Start searching from the beginning; (2) Return None if not found.

Let’s take a look at a few examples:

import re String = "abcdef"
m = re.match("abc",string) (1) Match "abc" and see what the returned result is
print(m)
print(m .group())
 n = re.match(
"abcf",string)
 print(n)                                                                                                                                                                            
l = re.match("bcd",string) (3) String search in the middle of the list

print(l)
## The running results are as follows:

<_sre.SRE_Match object; span=(0, 3), match='abc'> (1) abc (3)

None

It can be seen from the above output result (1) that using match() to match returns a match object object. If you want to convert it into a visible situation, you must use group() to convert (2) As shown here; if the matching regular expression is not in the string, None (3) is returned; match(pattern, string, flag) matches from the beginning of the string, and can only be performed from the beginning of the string Match (4) as shown.

(2)fullmatch(pattern, string, flags=0)

def fullmatch(pattern, string, flags=0):
  """Try to apply the pattern to all of the string, returning
a match object, or None if no match was found."""
  return _compile(pattern, flags).fullmatch(string)

Comment from above: Try to apply the pattern to all of the string, returning a match object, or None if no match was found...

(3)search(pattern,string,flags)

  def search(pattern , string, flags=0):
   """Scan through string looking for a match to the pattern, returning
   a match object, or None if no match was found. ""
# Return_Compile (Pattern, Flags) .search (string)
## Search (Pattern, String, String , flags) annotation is Scan throgh string looking for a match to the pattern, returning a match object, or None if no match was found. Search the regular expression at any position in the string, and return the match object if it is found. If not found, None is returned.

Key points: (1) Search from any position in the middle of the string, unlike match() which searches from the beginning; (2) If the search cannot be found, None will be returned;

import re String = "ddafsadadfadfafdafdadfasfdafafda"

m = re.search("a",string) (1) Match
from the middle print(m)
print(m.group())
n = re.search(
"N", string) (2) The situation that cannot be matched
# PRINT (n)
This
The running results are as follows:

<_sre.SRE_Match object; span=(2, 3), match='a'> (1)

a                                                                                                                                                                                                                                                    

As can be seen from the above result (1), search(pattern, string, flag=0) can match from any position in the middle, which expands the scope of use. Unlike match(), it can only match from the beginning. And when a match is found, a match_object object is returned; (2) If you want to display a match_object object, you need to use the group() method; (3) If it cannot be found, it returns None.

(4)sub(pattern,repl,string,count=0,flags=0)

def sub(pattern, repl, string, count=0, flags=0):
"""Return the string obtained by replacing the leftmost
                       non-overlapping occurrences of the pattern in string by the 
                          replacement repl. in it are processed. If it is
a callable, it's passed the match object and must return
a replacement string to be used."""
##     
return _compile(pattern, flags).sub(repl, string, count)
# sub(pattern,repl,string,count=0, flags=0) Find and replace, that is, first find whether the pattern is in the string; repl is to find the object matched by the pattern, and replace the characters found by the regular expression with what; count can specify the number of matches and how many matches. The example is as follows:

import re

String = "ddafsadadfadfafdafdadfasfdafafda"
m = re.sub(
" a","A",string)
#Do not specify the number of replacements (1) print(m)
 n = re.sub("a",
"A",string,
2) #Specify the number of replacements (2) print(n)
l = re.sub("F",
"B",string)
#Cannot match the situation (3) print(l)
# The running results are as follows:

ddAfsAdAdfAdfAfdAfdAdfAsfdAfAfdA -- (1) ddAfsAdadfadfafdafdadfasfdafafda -- (2)

ddafsadadfadfafdafdadfasfdafafda --(3)



The above code (1) does not specify the number of matches, so the default is to match all; (2) specifies the number of matches number, then only the specified number will be matched; if the regular pattern to be matched at (3) is not in the string, the original string will be returned.

    重点:(1)可以指定匹配个数,不指定匹配所有;(2)如果匹配不到会返回原来的字符串;

    (5)subn(pattern,repl,string,count=0,flags=0)

    def subn(pattern, repl, string, count=0, flags=0):
       """Return a 2-tuple containing (new_string, number).
       new_string is the string obtained by replacing the leftmost
       non-overlapping occurrences of the pattern in the source
       string by the replacement repl.  number is the number of
       substitutions that were made. repl can be either a string or a
       callable; if a string, backslash escapes in it are processed.
       If it is a callable, it's passed the match object and must
       return a replacement string to be used."""
       return _compile(pattern, flags).subn(repl, string, count)

    上面注释Return a 2-tuple containing(new_string,number):返回一个元组,用于存放正则匹配之后的新的字符串和匹配的个数(new_string,number)。

    import re
  string = "ddafsadadfadfafdafdadfasfdafafda"

  m = re.subn("a","A",string)     #全部替换的情况 (1
  print(m)

  n = re.subn("a","A",string,3)    #替换部分     (2
  print(n)

  l = re.subn("F","A",string)      #指定替换的字符串不存在   (3
  print(l)

    运行结果如下:

    ('ddAfsAdAdfAdfAfdAfdAdfAsfdAfAfdA', 11)     (1)
  ('ddAfsAdAdfadfafdafdadfasfdafafda', 3)      (2)
  ('ddafsadadfadfafdafdadfasfdafafda', 0)       (3)

As can be seen from the output of the above code, sub() and subn(pattern,repl,string,count=0,flags=0) can be seen that the matching effect of the two is the same, but The returned results are just different. sub() still returns a string, while subn() returns a tuple, which is used to store the new string after the regular expression and the number of replacements.

(6)split(pattern,string,maxsplit=0,flags=0)

def split(pattern, string, maxsplit=0, flags=0):
   """Split the source string by the occurrences of the pattern,
Returning A List Containing The Resulting Substrings. If
Capturing Parentheses are used in Pattern, then the text of all

## This are alsorned as part of the resulting
          list. ##    
return _compile(pattern, flags).split(string, maxsplit)
## split(pattern, string, maxsplit=0, flags=0) is the splitting of a string. It splits the string according to a certain regular requirement pattern. Returning a list containing the resulting substrings. is to split the string in some way and put the string in a list. The example is as follows:

import re String =

"ddafsadadfadfafdafdadfasfdafafda"

m = re.split(
" a",string)
#Split string (1
)print(m)n = re.split(" a",string,
3)
#Specify the number of splits
print(n) l = re.split("F",string )
#The split string does not exist in the list

print(l)
# The running results are as follows:

['dd', 'fs', 'd', 'df', 'df', 'fd', 'fd', 'df', 'sfd', 'f', 'fd ', ''] (1)

 ['dd', 'fs', 'd', 'dfadfafdafdadfasfdafafda']                                                     (3)

It can be seen from (1) that if the beginning or end of the string includes the string to be split, the following element will be a ""; at (2) we can specify the number of times to be split; (3) ) if the string to be split does not exist in the list, put the original string in the list.

(7)findall(pattern,string,flags=)

  def findall(pattern, string, flags=0):
   """Return a list of all non-overlapping matches in the string.

If one or more capturing groups are present in the pattern, return
            a list of groups; this will be a list of tuples if the pattern
            has more than one group.

   Empty matches are included in the result."""
   return _compile(pattern, flags).findall(string)
## Findall(pattern,string,flags=) returns a list containing all matching elements. Stored in a list. The example is as follows:

import re String =
"dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda"
m = re.findall(
" [a-z]",string) #Match letters, match all letters, return a list (1)
 print(m)
 n = re.findall(
"[0 -9]",string) #Match all numbers and return a list (2)
print(n)
l = re.findall(
"[ABC]", String)
## The running results are as follows:
['d', 'd', 'a', 'd', 'f', 'a' , 'd', 'f', 'a', 'f', 'd', 'a', 'f', 'd', 'a', 'd', 'f', 'a', ' s', 'f', 'd', 'a', 'f', 'a', 'f', 'd', 'a'] (1)
['1', '2', '3', '2', '4', '6', '4', '6', '5', '1', '6', '4', '8', '1', '5 ', '6', '4', '1', '2', '7', '1', '1', '3', '0', '0', '2', '5', '8'] (2)
[] (3)

The above code running result (1) matches all the strings, a single match; (2 If the number in the string is matched at ), a list will be returned; if the match at (3) does not exist, an empty list will be returned.

Key points: (1) If no match is found, an empty list is returned; (2) If the number of matches is not specified, only a single match will be made.

    (8)finditer(pattern,string,flags=0)

    def finditer(pattern, string, flags=0):
       """Return an iterator over all non-overlapping matches in the
       string.  For each match, the iterator returns a match object.

       Empty matches are included in the result."""
       return _compile(pattern, flags).finditer(string)

    finditer(pattern,string)查找模式,Return an iterator over all non-overlapping matches in the string.For each match,the iterator a match object.

    代码如下:

    import re
  string = "dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda"

  m = re.finditer("[a-z]",string)
  print(m)

  n = re.finditer("AB",string)
  print(n)
   

    运行结果如下:

               (1)
             (2)

    从上面运行结果可以看出,finditer(pattern,string,flags=0)返回的是一个iterator对象。

    (9)compile(pattern,flags=0)

    def compile(pattern, flags=0):
       "Compile a regular expression pattern, returning a pattern object."
       return _compile(pattern, flags)

    (10)pruge()

    def purge():
       "Clear the regular expression caches"
       _cache.clear()
       _cache_repl.clear()
 

    (11)template(pattern,flags=0)

    def template(pattern, flags=0):
       "Compile a template pattern, returning a pattern object"
       return _compile(pattern, flags|T)
    正则表达式:

    语法: 

 import re
 string = "dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda"

 p = re.compile("[a-z]+")  #Use first compile(pattern) to compile
m = p.match(string) #Then match
print(m.group())

The 2nd and 3rd lines above can also be combined into one line to write:

## m = p.match("^[0-9]",'14534Abc' )<span style="font-family: 宋体"> </span>

The effect is the same, the difference is that The first way is to check the requirements in advance The matching format is compiled (the matching formula is parsed) , so that when matching again, there is no need to compile the matching format. The second abbreviation is that the matching formula must be compiled every time a match is made. Therefore, if you need to match all lines starting with a number from a file with 50,000 lines, it is recommended to compile the regular formula before matching, which will be faster.

Matching format:

(1) ^ Matches the beginning of the string

import re String =
"dd12a32d41648f27fd11a0sfdda"

^ Matches the beginning of the string, now we use search() to match that starts with a number m = re.search(
"^[0-9]",string) # Matches a string that starts with a number ( 1)
print(m)
n = re.search(
"^[a-z]+",string) # Matches strings starting with letters Starting from the beginning, if it is matched from the beginning, there is not much difference from search() (2)
 print(n.group())

# The running result is as follows:

None

dd

In the above (1) we use ^ from the string Matching begins at the beginning, and whether the beginning of the matching is a number. Since the string is preceded by letters, not numbers, the matching fails and None is returned. At (2), we start matching with letters. Since the beginning is a letter, the match is correct and the correct result is returned; Looking at it this way, ^ is actually similar to match() starting from the beginning.

(2)$ Matches the end of the string

import re
String = "15111252598"

^ Matches the beginning of the string, now we use search()To match
that starts with a number m = re.match("^[0-9]{11}$",string)
print(m. group())

# The running results are as follows:

15111252598

## re.match("^[0-9]{11}$",string) means that the match starts with a number and has a length of 11. The format ends with a number;

## (3) Dot (·) matches any character, except newline characters. When the re.DoTALL tag is specified, it can match any character including a newline character

##import re

String =

"1511\n1252598"
## # Dot (
·) matches all characters except line breaks
 m = re. match(".",string) #Dot(·)
matches any character. If the number is not specified, it will match a single one. (1)print (m.group()) n = re.match(".+",string)
#.+ matches multiple any characters, except line breaks (2 )

 print(n.group())

# The running results are as follows:
1
 1511

It can be seen from the above code running results that (1) point (·) matches any character; (2) we match any multiple characters, but because the string contains spaces, As a result, only the content before the newline character in the string is matched, and the content after it is not matched.

Key points: (1) Dot (·) matches any character except newline characters; (2) .+ can match multiple any characters except newline characters.

## (4)[...] For example, [abc] matches "a", "b" or "c"

[object] matches the characters contained in brackets. [A-Za-z0-9] means match A-Z or a-z or 0-9.

import re
String = "1511\n125dadfadf2598"

[]Match Contains the characters in brackets
m = re.findall("[5fd]",string) # Matches 5,f,d
in the string print(m)

## The running results are as follows:

['5', '5', 'd', 'd', 'f', 'd', 'f', '5' ]

In the above code, we want to match 5, f, d in the string and return a list.

## (5) [^...] [^abc] Matches any character except abc

#import re String = "1511
\n125dadfadf2598"
Matches the characters contained in brackets Character
m = re.findall("[^5fd]",string)
# Matches characters in the string except 5, f, dprint(m)
# Run as follows:

['1', '1', '1', '\n', '1', '2', 'a', 'a ', '2', '9', '8']

In the above code, we match characters except 5, f, d, [^] matches non-square brackets Characters other than the inner characters.

(6)* Matches 0 or more expressions

##import re String =

"1511

\n125dadfadf2598"
#* is an expression that matches
0 or more Formula
m = re.findall("\d*",string) #Match 0 or more numbers
print(m)
# The running results are as follows:

['1511', '', '125', '', '', '', '', '', '', '' , '2598', '']

It can be seen from the above running results that (*) is an expression that matches 0 or more characters. What we match is 0 or more numbers. It can be seen that if there is no match, the returned Empty, and the last position returned is an empty ("").

## (7)+ Match one or more expressions

import re String =
"1511\n125dadfadf2598"

#(+)matches 1 or more expressions m = re.findall(
"\d+",string) #matches 1 or more numbers
 print(m)

## Run as follows:

['1511', '125', '2598']

Add (+) matches one or more expressions, and \d+ above matches one or more numeric expressions, at least matching one number.

## (8)? Matches 0 or 1 expressions, non-greedy way

import re

 string = "1511\n
125dadfadf2598"## 
#(?
)是match0 or 1 expressions m = re.findall("\d?",string) #Match
0 or 1 expressionsprint(m)
# 
The running results are as follows:

['1', '5', '1', '1', '', '1', '2', '5', '', '', '', '', '', '', '', '2', '5', '9', '8', '']

The question mark (?) above is to match 0 or 1 expressions, and the above is to match 0 or 1 expressions. If no match is found, empty ("") is returned

(9){n} Match n times, define the number of matches for a string

(10){n, m} Match n to m expressions

## (11)\w Match alphanumeric characters

# \w matches letters and numbers in the string. The code is as follows:

import re String =
"1511\n 125dadfadf2598"

 #(?)is an expression# that matches 0 or 1 ## m = re.findall("\w",string)
#Match 0 or 1 expressions print(m)
# Run as follows:

['1' , '5', '1', '1', '1', '2', '5', 'd', 'a', 'd', 'f', 'a', 'd', ' f', '2', '5', '9', '8']

As can be seen from the above code, \w is used to match alphanumeric characters in the string of. We use regular expressions to match letters and numbers.

## (12) \W \WThe uppercase W is used to match non-letters and numbers, which is exactly the opposite of the lowercase w

Examples are as follows:

import re

String =

"1511\n125dadfadf2598"
 #\W
Used to match non-letters and numbers in a string m = re.findall(
"\W",string) #\WUsed to match non-letters and numbers in a string
print(m)
Run as follows:

['\n']

In the above code, \W is used to match non-letters and numbers, and the result is that newlines are matched.

## (13)\s Matches any whitespace character, equivalent to [\n\t\f]

Examples are as follows:

import re
String = "1511\n125d\t a\rdf\fadf2598"

 #\s is used to match any whitespace character in the string , equivalent to [\n\t\r\f]
 m = re.findall("\s",string) #\s is used to match strings Any white space character
in print(m)
##  

Run as follows:

['\n' , '\t', '\r', '\x0c']

It can be seen from the above code running results: \s is used to match any empty character, we put Empty characters are matched

(14) \S Matches any non-empty characters

Examples are as follows:

import re String =
"1511\n125d\ta\ rdf\fadf2598"

 #\S is used to match any non-empty character m = re.findall(
"\S",string) #\S Used to match any non-empty character
print(m)
#  

Run as follows:

['1', '5', '1', '1', '1', '2', '5', 'd', 'a', 'd', 'f', 'a', 'd', 'f', '2', '5', '9', '8']

As can be seen from the above code, \S is used to match any non-empty character. In the result, we matched any non-empty character.

(15)\d Matches any number, equivalent to [0-9]

(16) \D Matches any non-number

Summary: findall() and split() generate lists, one with a certain number as the separator, and the other with to find all values ​​in . exactly the opposite.

The above is the detailed content of Regular expression (regular). For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn