Home > Article > Backend Development > php regular expression
The content of this article is about regular expressions in PHP, which has certain reference value. Now I share it with you. Friends in need can refer to it
Regular expression is a grammatical rule that describes the result of a string. It is a specific formatting pattern that can match, replace, and intercept matching strings. Commonly used languages basically have regular expressions, such as JavaScript, java, etc. In fact, as long as you understand the regular use of one language, it is relatively simple to use the regular rules of other languages. Okay, let’s start writing regular rules.
Related recommendations:
1. Regular expression syntax tutorial (including online testing tools)
2. PHP regular expression quick introduction video tutorial
When regular expressions match strings, they follow the following two basic principles:
1. The leftmost principle: Regular expressions always start from the target string. Starting from the leftmost position, matching is performed sequentially until the part that meets the requirements of the expression is matched, or until the end of the target string is matched.
2. The longest principle: For the matched target string, the regular expression will always match the longest part that meets the requirements of the regular expression; that is, the greedy mode
So what? To start, first start with the delimiter, which is commonly used to include /; #;~, which is used to indicate the beginning of a series of regular expressions. For example: ‘/a.*a/’. When the expression has too many escape characters, it is recommended to use # first, such as url;
$str = 'http://baidu.com'; $pattern = '/http:\/\/.*com/';//需要转义/ preg_match($pattern,$str,$match); var_dump( $match);
$str = 'http://baidu.com'; $pattern = '#http://.*com#';//不需要转义/ preg_match($pattern,$str,$match); var_dump( $match);
Now that you know how to write the beginning and the end, the next step is to judge the middle. Regular expressions are spliced using atoms and metacharacters from left to right.
For example, 'a4b561c25d9afb9ac8dc4d70affff419zxcv0d36329ec37a2cc24d42c7229b69747a', when matching, '/a4b561c25d9afb9ac8dc4d70affff419.*ff5ce9b4edd32c649942decae47f914a/', where .* represents zxcv.
So what are the common atoms and metacharacters?
• \d Matches a numeric character. Equivalent to [0-9].
• \D Matches a non-numeric character. Equivalent to [^0-9].
• \f matches a form feed character. Equivalent to \x0c and \cL.
• \n Matches a newline character. Equivalent to \x0a and \cJ.
• \rmatches a carriage return character. Equivalent to \x0d and \cM.
• \s Matches any whitespace character, including spaces, tabs, form feeds, etc. Equivalent to [ \f\n\r\t\v].
• \S matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v].
• \tmatches a tab character. Equivalent to \x09 and \cI.
• \v Matches a vertical tab character. Equivalent to \x0b and \cK.
• \w Matches any word character including an underscore. Equivalent to '[A-Za-z0-9_]'.
• \W matches any non-word character. Equivalent to ‘[^A-Za-z0-9_]’.
• \xn Matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be exactly two digits long. For example, '\x41' matches "A". ‘\x041’ is equivalent to ‘\x04’ & “1”. ASCII encoding can be used in regular expressions.
• \nmIdentifies an octal escape value or a backreference. If \nm is preceded by at least nm get-subexpressions, nm is a backward reference. If \nm is preceded by at least n obtains, n is a backward reference followed by a literal m. If none of the previous conditions are met, and if n and m are both octal digits (0-7), • \nm will match the octal escape value nm.
• \nmlIf n is an octal number (0-3), and m and l are both octal numbers (0-7), then Matches the octal escape value nml.
• \unUnicode characters represented by hexadecimal numbers. For example, \u00A9 matches the copyright symbol (?).
• . Matches any single character except "\n"
• ^ Matches the beginning of the input string. In the character field [], it means negation, such as '[^\w]' equals '\w'; and ^\w means starting with a word character.
• $ Matches the end position of the input string. For example '\w$' means ending with a word character.
• ? Matches the preceding subexpression zero or once is equivalent to {0,1}, for example, "do(es)?" can match "do" or "does".
• * Matches the previous subexpression zero or more times , equivalent to {0,}. For example, zo* matches "z", "zo", 'zoo'.
• Matches the previous subexpression one or more times, equivalent to {1,}. For example, 'zo ' can match "zo" and "zoo".
• {n} n is a non-negative integer, matched n times. For example, 'o{2}' doesn't match "Bob" or 'Booob', but it does match the two o's in "food".
• {n,} n is a non-negative integer. Match at least n times. For example, 'o{2,}' does not match the 'o' in "Bob", but it matches all o's in "foooood". 'o{1,}' is equivalent to 'o '. 'o{0,}' is equivalent to 'o*'.
• {n,m} m and n are both non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o{1,3}" will match the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Please note that there cannot be a space between the comma and the two numbers.
• [] Character set (character field). Matches any one of the characters contained. For example, '[abc]' matches 'a' in "plain".
• () Match the content in () and get this match. With \n (n is an integer greater than 1), 'http://baidu.com' matches 'http://baidu' if the expression: '(\w) (:)\/\/.*\1' .comhttp',\1 means http.
• (?:) matches but does not obtain the matching result and does not store it for later use. This is useful when using the "or" character (|) to combine parts of a pattern. For example, 'industr(?:y|ies) is a shorter expression than 'industry|industries'. If the above expression is changed to '(?:\w )(:)\/\/.*\1', then \1 is expressed as:
• | x|y,匹配 x 或 y。例如,'z|food' 能匹配 "z" 或 "food"。'(z|f)ood' 则匹配 "zood" 或 "food"。
• [-] 字符范围。匹配指定范围内的任意字符。例如,'[a-z]' 可以匹配 'a' 到 'z' 范围内的任意小写字母字符。
• (?=pattern)正 向预查,在任何匹配 pattern 的字符串开始处匹配查找字符串。这是一个非获取匹配,也就是说,该匹 配不需要获取供以后使用。例如,'Windows (?=95|98|NT|2000)' 能匹配 "Windows 2000" 中的 "Windows" ,但不能匹配 "Windows 3.1" 中的 "Windows"。预查不消耗字符,也就是说,在一个匹配发生后,在最后一次匹配之后立即开始下一次匹 配的搜索,而不是从包含预查的字符之后开始。
• (?!pattern)负 向预查,在任何不匹配 pattern 的字符串开始处匹配查找字符串。这是一个非获取匹配,也就是说,该匹配不 需要获取供以后使用。例如'Windows (?!95|98|NT|2000)' 能匹配 "Windows 3.1" 中的 "Windows",但不能匹配 "Windows 2000" 中的 "Windows"。预查不消耗字符,也就是说,在一个匹配发生后,在最后一次匹配之后立即开始下一次匹配的搜 索,而不是从包含预查的字符之后开始
有时候最后定界符会有一个字母,如‘/as.*/i’,那这个i又是什么呢,这就是模式修正符;
i表示在和模式进行匹配进不区分大小写
m将模式视为多行,使用^和$表示任何一行都可以以正则表达式开始或结束
s如果没有使用这个模式修正符号,元字符中的"."默认不能表示换行符号,将字符串视为单行
x表示模式中的空白忽略不计
e正则表达式必须使用在preg_replace替换字符串的函数中时才可以使用(讲这个函数时再说)
A以模式字符串开头,相当于元字符^
Z以模式字符串结尾,相当于元字符$
U正则表达式的特点:就是比较“贪婪”,使用该模式修正符可以取消贪婪模式
例:
$str = 'asddadsdasd'; $pattern = '/a.*d/'; preg_match($pattern,$str,$match); var_dump($match) ;//asddadsdasd; $str = 'asddadsdasd'; $pattern = '/a.*d/U';//$pattern = '/a.*?d/'; preg_match($pattern,$str,$match); var_dump($match) ;//asd
php常用正则函数;
匹配:preg_match()与preg_match_all()
1 preg_match($pattern,$subject,[array &$matches])
2 preg_match_all($pattern,$subject,array &$matches)
1只会匹配一次,2会把所有符合的字符串都匹配出来,并且放置到matches数组中,而且这两个函数都有一个整形的返回 值。1是一维数组,2是二维数组
替换:preg_replace()
mixed preg_replace ( mixed $pattern , mixed $replacement , mixed $subject [, int $limit = -1 [, int &$count ]] )
搜索subject中匹配pattern的部分, 以replacement进行替换。
相关推荐:
The above is the detailed content of php regular expression. For more information, please follow other related articles on the PHP Chinese website!