Home  >  Article  >  Backend Development  >  Illustrated guide to using python regularization

Illustrated guide to using python regularization

高洛峰
高洛峰Original
2017-03-23 17:46:091618browse

Last time many friends wrote about using regular expressions for text blocking, it’s not that I don’t want to use regular expressions (I don’t use regular expressions very much, as anyone who has seen my previous crawlers knows, I directly use BeautifulSoup’s webpage tags to find content. Because it is easy to understand and convenient,), but it is difficult to master the regular table (anyone who has seen the regular table should know that there are many method rules corresponding to the symbols in it, which is very flexible), for friends who have not been exposed to programming for a long time It is very likely that a lot of time will be wasted in the programming process. Today I will briefly introduce the frequently used regular expressions. Unless they are very special, they will basically be used.

1. A brief introduction to regular expressions

First you have to import the regular expression method import re Regular expression is a powerful tool for processing strings and has its own independent processing The mechanism may not be as efficient as str's own method, but its function is very flexible and powerful. Its operation process is to first define a matching rule ("the content you want + regular grammar rules"), put in the string to be matched, and then retrieve the information you want through the internal mechanism of the regular rules.

2. Several commonly used postures of findall

The basic structure is roughly: nojoke = re.findall(r'matching rules','the desired string to be retrieved ') nojoke is the result we finally returned through regularity. re regularity findall searches for all r flags that represent statements that are followed by regularity (so it is easy to check when there is a lot of code). Let's look at a few examples to understand more

Illustrated guide to using python regularization

This code is to find all the bi characters in the search string and return them in the form of a list. This is often used to count the number of occurrences of unified characters. Continue to look at the next one

Illustrated guide to using python regularization

The symbol ^ is added here to indicate that the string that matches the string starting with abi is returned. You can also determine whether the string starts with abi.

Illustrated guide to using python regularization

The $ symbol is used here to indicate that the string ending with gbi is returned to determine whether the string ends.

Illustrated guide to using python regularization

Here [...] means matching the values ​​of a and f, or b and f, or c and f in the brackets to return a list.

Illustrated guide to using python regularization

"\d" is a regular syntax rule used to match numbers between 0 and 9 and return a list. It should be noted that 11 will be treated as the strings '1' and ' 1' returns instead of returning the string '11'. Remember to use it incorrectly and there will be a big pitfall.

Illustrated guide to using python regularization

Of course, the solution is to write as many \d as you want. The above demonstrates how to get 3 digits in a string. Here is a flexible regular expression. aspect.

Illustrated guide to using python regularization

The small d here means taking the numbers 0-9, and the big D means no numbers are needed, that is, content other than numbers is returned.

Illustrated guide to using python regularization

"\w" in the regular expression represents matching from lowercase a to z, uppercase A to Z, and the numbers 0 to 9 include the first three types, as printed above .

Illustrated guide to using python regularization

"\W" in regular expressions means matching special symbols other than letters and numbers, but when using \slashes here, you should pay attention to the fact that \ is escaped in the string Please go to Baidu to learn the specific symbols.

Illustrated guide to using python regularization

The usage of brackets () here means that the matching is to take the content inside the brackets. Here.* is the regular greedy matching syntax. The key point is to maximize the greedy benefit and the maximum range of matching criteria, as shown in the figure above.

Illustrated guide to using python regularization

A question mark is added here.*? It is to limit it from matching to the maximum range, which is also called non-greedy pattern matching. The result is to match the contents of the two p's and return them.

Illustrated guide to using python regularization

Add re.I (capital i) here to indicate matching, regardless of the case of male or female. Otherwise, the above match will occur if there are upper and lower case characters behind it. Not found returns an empty list to you.

Illustrated guide to using python regularization

The trouble here is \n, commonly known as line break. Once the line breaks, the program will no longer recognize it, so we added re.S (capital) to represent Rather than matching all characters including line breaks and returning them, basically after you learn the above syntax and usage, you can get more than 70% of the matching methods. Of course, there are many methods that I won’t list. You can learn by yourself (the rest I rarely use the basics).

2. Usage and difference between match and search:

re.match tries to match a pattern from the starting position of the string. If it is not the starting position, the match is successful. If so, match() returns none. re.search scans the entire string and returns the first successful match. It's easy to understand if you look at the code. As follows:

Illustrated guide to using python regularization

Here, print the end directly and add .span() to get the position of the matching string returned as a tuple (starting position, ending position), there is one Not written because it returns null and the compiler will report an error.

Illustrated guide to using python regularization

Is it clear at a glance? match will only match the beginning. If it cannot find it, it will return None. I did not add .group() here because the return value is a null value. I added it. The compiler will report an error, and search will scan the entire string without being picky. Of course, you can also use the above regular method to match it. I won’t introduce too much here and you can practice it.

3. How to replace sub Replacement string', the string that needs to be retrieved)



This reflects the result very intuitively. Replace the # sign and the following string with the string you want to change. . Illustrated guide to using python regularization

4. Final benefits

Before giving the final benefits, I hope everyone can practice the above usage and rules. Only by making more mistakes and summarizing can you accumulate experience. , the last benefit is to tell you some commonly used email matching rules as follows:



Illustrated guide to using python regularization


-->

The above is the detailed content of Illustrated guide to using python regularization. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn