Home >Backend Development >Python Tutorial >The basic principle is to use a series of special characters and syntax to match and manipulate text data. Regular expressions typically include a pattern string that describes the text pattern to be matched, and one or more special characters and syntax that control how and what the result is. Regular expressions in Python are usually implemented using the re module
Regular expression, also known as regular expression, (English: Regular Expression, often abbreviated as regex, regexp or RE in code), it is a concept in computer science. Regular expressions are often used to retrieve and replace text that matches a certain pattern. Many programming languages support string manipulation using regular expressions. For example, Perl has a powerful regular expression engine built into it. The concept of regular expressions was originally popularized by tool software in Unix. A regular expression is a logical formula that operates on strings (including ordinary characters (for example, letters between a to z) and special characters (called "metacharacters")), which uses some predefined specific characters. , and the combination of these specific characters form a "rule string". This "rule string" is used to express a filtering logic for strings. A regular expression is a text pattern that describes one or more strings to match when searching for text.
After talking a lot of nonsense, you may still be confused. Let’s explain it through examples. We can use regular expression testing tools, or python All is OK. First, we enter a piece of text.
hello, my name is Tina, my phone number is 123456 and my web is http://tina.com.
[a-zA-z]+://[^\s]*
We can get the web link, That is, the url in the text. Isn’t it amazing?
This is because it has its own matching rules, some of which are as follows.
Pattern | Description |
. | Any character |
* | 0 or more expressions |
One or more expressions |
You can check more matching rules by yourself.
?,*, ,\d,\w are all equivalent characters
? are equivalent to matching Length {0,1}
* is equivalent to matching length {0,}
is equivalent to matching length {1,}
\d Equivalent to [0-9]
\D Equivalent to [^0-9]
\w Equivalent to [A-Za-z_0-9]
\W Equivalent to [^A-Za-z_0-9]
res = re.match('hello\s(\d+)sword')
res = re.match('hello.*(\d+)sword')
import re useData = str(input('请输入字符串数据:')) ''' 匹配字符串中的数字,+是匹配前面的子表达式一次或多次 ''' digital = re.findall('\d+',useData) print(digital)Let’s take a look at the running results The findall() function returns all matching strings, and the data type of the return value is a list. Common symbolsLet’s talk about the common symbols of regular expressions.
The "." character matches any single character. The "\" character is an escape character. "[…]" is the character set. "(.*?)" is the most commonly used character in python crawlers. It is called a greedy algorithm and can match any character.Let’s look at a sample code below.
import re a=‘xxixxjshdxxlovexxsfhxxpythonxx' data=re.findall(‘xx(.*?)xx') print(data)Let’s run it and see the effect.
Run resultsSpecial characters So-called special characters , which are characters with special meanings, such as those in runoo*b. Simply put, they represent the meaning of any string. If you want to find the * symbol in a string, you need to escape the *, that is, add a \ before it, and runo*ob matches the string runo\*ob. Many metacharacters require special treatment when trying to match them. To match these special characters, you must first "escape" the characters, that is, precede them with the backslash character \. The following table lists the special characters in regular expressions:[‘I’,‘love’,‘python’]
Description | |
Matches the end of the input string. If the RegExp object's Multiline property is set, $ also matches ‘\n’ or ‘\r’. To match the $ character itself, use $. | |
Marks the start and end of a subexpression. Subexpressions can be obtained for later use. To match these characters, use ( and ). | |
Matches the preceding subexpression zero or more times. To match the * character, use *. | |
Matches the preceding subexpression one or more times. To match characters, use . | |
Matches any single character except the newline character \n. To match . , use . . |
The above is the detailed content of The basic principle is to use a series of special characters and syntax to match and manipulate text data. Regular expressions typically include a pattern string that describes the text pattern to be matched, and one or more special characters and syntax that control how and what the result is. Regular expressions in Python are usually implemented using the re module. For more information, please follow other related articles on the PHP Chinese website!