Home >Backend Development >Python Tutorial >Detailed explanation of regular expressions in Python

Detailed explanation of regular expressions in Python

小云云
小云云Original
2017-12-18 15:05:402038browse


Regular expressions are expressions used to express a set of strings concisely. This article mainly shares with you the detailed knowledge of regular expressions in Python. I hope it can help you.

##^matches the beginning of the string $ matches the end of the string ( ) grouping mark, only the | operator can be used internally \d number, equivalent to [0-9]\w word characters, equivalent to [A-Za-z0-9_]
Operator Description Example
. represents any single character
[ ] character set, the single character value range is [abc] means a or b or c; [a-z] means a to z single character
[^ ] Not Character set, single character exclusion range [^abc] means non-a or non-b or non-c
* 0 or unlimited expansions of the previous character abc* means ab, abc, abcc, abccc...
+ One or unlimited expansion of the previous character abc+ means abc, abcc, abccc...
? 0 or 1 expansion of the previous character abc? means ab, abc
| represents any abc|def represents abc or def
{m} M times expansion of the previous character ab{2} means abcc
{m,n} m to n expansions of the previous character (including n) ab{1, 2} means abc, abcc
^abc represents abc and is at the beginning of a string
abc$ represents abc And at the end of a string
(abc|def ) represents abc or def


#If you are familiar with the above operators, the following example is not difficult.

1. Only numbers can be entered: ^[0-9]*$

2. Only n-digit numbers can be entered: ^\d{n}$

3. Only numbers with at least n digits can be entered: ^\d{n,}$

4. Only numbers with m~n digits can be entered: ^\d{m,n}$

5. Only numbers starting with zero and non-zero can be entered: ^(0|[1-9][0-9]*)$

6. Only positive real numbers with two decimal places can be entered :^[0-9]+(.[0-9]{2})?$

7. Only positive real numbers with 1 to 3 decimal places can be entered: ^[0-9]+( .[0-9]{1,3})?$

8. Only non-zero positive integers can be entered: ^+?[1-9][0-9]*$


【Python3 Regular Expression】

FunctionDescriptionre.match() Matches a pattern from the starting position of the string. If the starting position is not matched successfully, match() returns none. re.search()Scan the entire string and return the first successful match. re.sub() is used to replace all substrings matching the regular expression in the string and return the replaced stringre.findall()Search for a string and return all matching substrings in list formre.split()Cut the string according to the regular expression matching results and return a listre.finditer()Search for the string and return a matching result The iteration type, each iteration element is a match object
>>> match= re.findall(r'[1-9]\d{5}','100081BIT  BIT10008676')>>> print(match)
['100081', '100086']>>> match = re.split(r'[1-9]\d{5}','100081BIT  BIT10008676')>>> match
['', 'BIT  BIT', '76']>>> match = re.split(r'[1-9]\d{5}','100081BIT  BIT10008676',maxsplit=1)>>> match
['', 'BIT  BIT10008676']

>>>for m in re.finditer(r'[1-9]\d{5}','100081BIT  BIT10008676'):       if m:
            print(m.group(0))    
100081100086

The difference between re.match and re.search

re.match only matches the beginning of the string. If the beginning of the string does not match the regular expression, the match fails and the function returns None; while re. search matches the entire string until a match is found.


Detailed explanation of regular expressions in Python

##?0 or 1 expansion of the previous character##|abc|def{m}ab{ 2}{m,n}ab{1,2}##^ matches the beginning of the string represents abc and is at the beginning of a string$matches the end of the string means abc and is at the end of a string ( )Grouping mark, only the | operator# can be used internally ##(abc|def)abc or def number, which is equivalent to [0-9] word character, equivalent to [A-Za-z0-9_]1. Only numbers can be entered: ^[0-9]*$
Operator Description Example
. represents any single character
[ ] Character set, single character value range [abc] represents a or b or c; [a-z] represents a to z single character
[^ ] Non-character set, single character exclusion range [^abc] means non-a or non-b or non-c
* 0 or unlimited expansion of the previous character abc* means ab, abc, abcc, abccc...
+ One or unlimited expansion of the previous character abc+ means abc , abcc, abccc...
abc? represents ab, abc
represents any represents abc or def
m times expansion of the previous character means abcc
m to n expansions of the previous character (including n) means abc, abcc
^abc
abc$
represents \d
##\w

If you are familiar with the above operators, the following example is not difficult.
2. Only n-digit numbers can be entered: ^\d{n}$

3. Only numbers with at least n digits can be entered: ^\d{n,}$

4. Only numbers with m~n digits can be entered: ^\d{m,n}$

5. Only numbers starting with zero and non-zero can be entered: ^(0|[1-9][0-9]*)$

6. Only positive real numbers with two decimal places can be entered :^[0-9]+(.[0-9]{2})?$

7. Only positive real numbers with 1 to 3 decimal places can be entered: ^[0-9]+( .[0-9]{1,3})?$

8. Only non-zero positive integers can be entered: ^+?[1-9][0-9]*$

【Python3 Regular Expression】


Function

Descriptionre.match() Matches a pattern from the starting position of the string. If the starting position is not matched successfully, match() returns none. re.search()Scan the entire string and return the first successful match. re.sub() is used to replace all substrings matching the regular expression in the string and return the replaced stringre.findall()Search for a string and return all matching substrings in list formre.split()Cut the string according to the regular expression matching results and return a listre.finditer()Search for the string and return a matching result Iteration type, each iteration element is a match object
>>> match= re.findall(r'[1-9]\d{5}','100081BIT  BIT10008676')>>> print(match)
['100081', '100086']>>> match = re.split(r'[1-9]\d{5}','100081BIT  BIT10008676')>>> match
['', 'BIT  BIT', '76']>>> match = re.split(r'[1-9]\d{5}','100081BIT  BIT10008676',maxsplit=1)>>> match
['', 'BIT  BIT10008676']

>>>for m in re.finditer(r'[1-9]\d{5}','100081BIT  BIT10008676'):       if m:
            print(m.group(0))    
100081100086
re.match only matches strings At the beginning, if the string does not match the regular expression at the beginning, the match fails and the function returns None; while re.search matches the entire string until a match is found.
The difference between re.match and re.search

Related recommendations:

Detailed explanation of regular expressions in PythonDetailed explanation of js regular expressions

php regular expressions Detailed explanation of expressions_PHP tutorial

Very important detailed explanation of php regular expressions, detailed explanation of php regular expressions

The above is the detailed content of Detailed explanation of regular expressions in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn