Home  >  Article  >  Backend Development  >  The basic principle is to use a series of special characters and syntax to match and manipulate text data. Regular expressions typically include a pattern string that describes the text pattern to be matched, and one or more special characters and syntax that control how and what the result is. Regular expressions in Python are usually implemented using the re module

The basic principle is to use a series of special characters and syntax to match and manipulate text data. Regular expressions typically include a pattern string that describes the text pattern to be matched, and one or more special characters and syntax that control how and what the result is. Regular expressions in Python are usually implemented using the re module

WBOY
WBOYforward
2023-05-10 09:40:141302browse

    What is a regular expression?

    Regular expression, also known as regular expression, (English: Regular Expression, often abbreviated as regex, regexp or RE in code), it is a concept in computer science. Regular expressions are often used to retrieve and replace text that matches a certain pattern. Many programming languages ​​support string manipulation using regular expressions. For example, Perl has a powerful regular expression engine built into it. The concept of regular expressions was originally popularized by tool software in Unix. A regular expression is a logical formula that operates on strings (including ordinary characters (for example, letters between a to z) and special characters (called "metacharacters")), which uses some predefined specific characters. , and the combination of these specific characters form a "rule string". This "rule string" is used to express a filtering logic for strings. A regular expression is a text pattern that describes one or more strings to match when searching for text.

    1. Introduction to Examples

    After talking a lot of nonsense, you may still be confused. Let’s explain it through examples. We can use regular expression testing tools, or python All is OK. First, we enter a piece of text.

    hello, my name is Tina, my phone number is 123456 and my web is http://tina.com.

                                   [a-zA-z]+://[^\s]*

    We can get the web link, That is, the url in the text. Isn’t it amazing?

    This is because it has its own matching rules, some of which are as follows.

    Pattern Description
    . Any character
    * 0 or more expressions
    One or more expressions

    You can check more matching rules by yourself.

    ?,*, ,\d,\w are all equivalent characters

    ? are equivalent to matching Length {0,1}

    * is equivalent to matching length {0,}

    is equivalent to matching length {1,}

    \d Equivalent to [0-9]

    \D Equivalent to [^0-9]

    \w Equivalent to [A-Za-z_0-9]

    \W Equivalent to [^A-Za-z_0-9]

    2. MATCH ()

    ## here Introduce a commonly used matching method & mdash; & mdash; match (). You can check whether this regular expression matches a string.

    Matching target

    res = re.match('hello\s(\d+)sword')

    Greedy matching

    res = re.match('hello.*(\d+)sword')

    3.findall()

    This is the most commonly used one, let’s see how it is used .

    import re
     
    useData = str(input('请输入字符串数据:'))
     
    '''
    匹配字符串中的数字,+是匹配前面的子表达式一次或多次
    '''
    digital = re.findall('\d+',useData)
     
    print(digital)

    Let’s take a look at the running results

    The basic principle is to use a series of special characters and syntax to match and manipulate text data. Regular expressions typically include a pattern string that describes the text pattern to be matched, and one or more special characters and syntax that control how and what the result is. Regular expressions in Python are usually implemented using the re module

    The findall() function returns all matching strings, and the data type of the return value is a list.

    Common symbols

    Let’s talk about the common symbols of regular expressions.

    The "." character matches any single character.

    The "\" character is an escape character.

    "[…]" is the character set.

    "(.*?)" is the most commonly used character in python crawlers. It is called a greedy algorithm and can match any character.

    Let’s look at a sample code below.

    import re
     
    a=‘xxixxjshdxxlovexxsfhxxpythonxx'
     
    data=re.findall(‘xx(.*?)xx')
     
    print(data)

    Let’s run it and see the effect.

    Run results

    [‘I’,‘love’,‘python’]

    Special characters

    So-called special characters , which are characters with special meanings, such as those in runoo*b. Simply put, they represent the meaning of any string. If you want to find the * symbol in a string, you need to escape the *, that is, add a \ before it, and runo*ob matches the string runo\*ob.

    Many metacharacters require special treatment when trying to match them. To match these special characters, you must first "escape" the characters, that is, precede them with the backslash character \. The following table lists the special characters in regular expressions:

    Special CharactersDescription$ () Marks the start and end of a subexpression. Subexpressions can be obtained for later use. To match these characters, use ( and ). * Matches the preceding subexpression zero or more times. To match the * character, use *. Matches the preceding subexpression one or more times. To match characters, use . . Matches any single character except the newline character \n. To match . , use . .

     概念说了一大堆,大家可能也记不住,我直接说几个案例,大家就能明白其他的道理。

    这是某网的HTML,部分片段如下:

    <span class=“price”>§<i>123</i></span>
     
    <span class=“price”>§<i>133</i></span>
     
    <span class=“price”>§<i>156</i></span>
     
    <span class=“price”>§<i>189</i></span>

       大家会发现只有中间的一段不一样,而不一样的数据就是我们想要的,我们怎么用正则表达式提取出来呢.

    <span class=“price”>§<i>(.*?)</i></span>

    就可以了,我们看看效果吧。

    123
    133
    156
    189

    Matches the end of the input string. If the RegExp object's Multiline property is set, $ also matches ‘\n’ or ‘\r’. To match the $ character itself, use $.

    The above is the detailed content of The basic principle is to use a series of special characters and syntax to match and manipulate text data. Regular expressions typically include a pattern string that describes the text pattern to be matched, and one or more special characters and syntax that control how and what the result is. Regular expressions in Python are usually implemented using the re module. For more information, please follow other related articles on the PHP Chinese website!

    Statement:
    This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete