Home  >  Article  >  Backend Development  >  Content summary of PHP regularity (detailed)

Content summary of PHP regularity (detailed)

不言
不言forward
2019-02-15 12:00:205199browse

The content of this article is a summary (detailed) of PHP regular content. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

1. Regular basic knowledge

Row locators (^ and $)

Row locators are used to describe the boundaries of strings. "$" represents the end of the line. "^" represents the beginning of the line. For example, "^de" represents a string starting with de. "de$" represents a string ending with de.

Word delimiter

When we are searching for a word, such as whether an exists in a string "gril and body", it is obvious that if it matches, an is definitely OK The matching string "gril and body" is matched. How can we make it match words instead of part of words? At this time, we can use a word delimiter \b.
\ban\b If you try to match "gril and body", it will prompt that it cannot match.
Of course there is also a capital \B, which means exactly the opposite of \b. The string it matches cannot be a complete word, but a part of other words or strings. Such as \Ban\B.

Select the character (|) to represent or

Select the character to represent or. For example, Aa|aA means Aa or aA. Note that the difference between using "[]" and "|" is that "[]" can only match a single character, while "|" can match a string of any length. When using "[]", it is often used together with the connecting character "-", such as [a-d], which represents a or b or c or d.

Exclude characters, exclude operations

Regular expressions provide "^" to exclude non-matching characters, ^ is generally placed in []. For example, [^1-5], this character is not a number between 1 and 5.

Qualifier (?* {n, m})

Qualifier is mainly used to limit the number of occurrences of each string.

Qualified characters Meaning
? Zero or once
* Zero or more times
One or more times
{n} n times
{n,} At least n times
{n,m} n to m times

For example (D) represents one or more D

Dot operator

matches any character (not including Newline character)

The backslash (\) in the expression

The backslash in the expression has multiple meanings, such as escaping and specifying predefined Character sets, defining assertions, displaying non-printable characters.

Escape characters

Escape characters mainly convert some special characters into ordinary characters. These commonly used special characters include ".", "?", "\", etc.

Specify a predefined character set

Characters Meaning
\d Any decimal number [0-9]
\D Any non-decimal number
\s Any whitespace character (space, line feed, form feed, carriage return, character)
\S Any non-whitespace character
\w Any word character
\W Any non-word characters

Display non-printable charactersCharactersMeaning\aCall the police\bbackspace\fPage change\nLine break\rEnter\tCharacter

Bracket character ()

The main functions of parentheses in regular expressions are:


Change the qualifier such as ( |, *, ^) scope

For example, (my|your)baby, if there is no "()", | will match either my or yourbaby. With parentheses, it will match mybaby or yourbaby.

Group for easy back reference

Pattern modifier

The role of the pattern modifier is to set the pattern, that is, how the regular expression explain. The main patterns in php are as follows: ModifierillustrateiIgnore casemMultiple text modesSingle line text modeIgnore whitespace characters

x

###

U Lazy mode (do not write the default greedy mode)

2. Commonly used PHP regular functions and examples

a. preg_grep() function

## The #preg_grep function returns array entries that match a pattern.

Syntax

array preg_grep ( string $pattern , array $input [, int $flags = 0 ] )

Returns an array consisting of elements in the given array input that match the pattern pattern.

Parameter description:

  • $pattern: The pattern to be searched, in string form.

  • $input: input array.

  • $flags: If set to PREG_GREP_INVERT, this function returns an array of elements in the input array that do not match the given pattern pattern.

Example

Return the specified matching elements in the array:

<?php
$array = array(1, 2, 3.4, 53, 7.9);
// 返回所有包含浮点数的元素
$fl_array = preg_grep("/^(\d+)?\.\d+$/", $array);
print_r($fl_array);
?>

The execution result is as follows:

Array
(
    [2] => 3.4
    [4] => 7.9
)

It can be seen that preg_grep only returns the floating point numbers in the array.

b.preg_match() function

PHP Regular Expression (PCRE)

preg_last_error function is used to perform a regular expression match.

Syntax

int preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] )

Search subject for a match with the regular expression given by pattern.

Parameter description:

$pattern: The pattern to be searched, in string form.

$subject: input string.

$matches: If the parameter matches is provided, it will be populated as search results. $matches[0] will contain the text matched by the full pattern, $matches[1] will contain the text matched by the first captured subgroup, and so on.

$flags: flags can be set to the following flag values:

PREG_OFFSET_CAPTURE: If this flag is passed, the string offset (relative to target string). Note: This will change the array filled in the matches parameter so that each element becomes a string where the 0th element is the matched string and the 1st element is the offset of the matched string in the target string subject. .

offset: Normally, the search starts from the beginning of the target string. The optional parameter offset is used to specify the search starting from an unknown point in the target string (unit is bytes).

Return value

Returns the number of matches of pattern. Its value will be 0 (no match) or 1 because preg_match() will stop searching after the first match. preg_match_all() differs from this in that it searches for the subject until it reaches the end. If an error occurs preg_match() returns FALSE.

Example

Find the text string "php":

<?php
//模式分隔符后的"i"标记这是一个大小写不敏感的搜索
if (preg_match("/php/i", "PHP is the web scripting language of choice."))
{
echo "查找到匹配的字符串 php。";
} else {
echo "未发现匹配的字符串 php。";
} ?>

The execution result is as follows:

查找到匹配的字符串 php。

Find the word "word"

<?php
/* 模式中的\b标记一个单词边界,所以只有独立的单词"web"会被匹配,而不会匹配 * 单词的部分内容比如"webbing" 或 "cobweb" */
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
echo "查找到匹配的字符串。\n";
} else {
echo "未发现匹配的字符串。\n";
}
if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) {
echo "查找到匹配的字符串。\n";
} else {
echo "未发现匹配的字符串。\n";
} ?>

The execution result is as follows:

查找到匹配的字符串。
未发现匹配的字符串。

Get the domain name in the URL

<?php
// 从URL中获取主机名称
preg_match(&#39;@^(?:http://)?([^/]+)@i&#39;, "http://www.runoob.com/index.html", $matches);
$host = $matches[1];
// 获取主机名称的后面两部分
preg_match(&#39;/[^.]+\.[^.]+$/&#39;, $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>

The execution result is as follows:

domain name is: runoob.com

c.preg_match_all() function

PHP Regular Expression (PCRE)

The preg_match_all function is used to perform a global regular expression matching.

Syntax

int preg_match_all ( string $pattern , string $subject [, array &$matches [, int $flags = PREG_PATTERN_ORDER [, int $offset = 0 ]]] )

Search for all matching results in subject that match the given regular expression pattern and output them to matches in the order specified by flag.

After the first match is found, the subsequence continues to search from the last matching position.

Parameter description:

$pattern: The pattern to be searched, in string form.

$subject: input string

$matches: multi-dimensional array, output all matching results as output parameters, array sorting is specified by flags.

$flags: Can be used in combination with the following flags (note that PREG_PATTERN_ORDER and PREG_SET_ORDER cannot be used at the same time):

PREG_PATTERN_ORDER: The results are sorted into $matches[0] to save all matches of the complete pattern, $matches[ 1] Save all matches of the first subgroup, and so on.

PREG_SET_ORDER: The results are sorted as $matches[0] contains all matches (including subgroups) from the first match, $matches[1] contains all matches (including subgroups) from the second match ), and so on.

PREG_OFFSET_CAPTURE: If this flag is passed, each found match is returned with its offset relative to the target string increased.

offset: Usually, the search starts from the beginning of the target string. The optional parameter offset is used to start searching from the specified position in the target string (unit is bytes).

Return value

Returns the number of complete matches (possibly 0), or returns FALSE if an error occurs.

Example

Find the content that matches the a4b561c25d9afb9ac8dc4d70affff419 and 0d36329ec37a2cc24d42c7229b69747a tags: (I usually get $pat_array[1])

<?php
$userinfo = "Name: <b>PHP</b> <br> Title: <b>Programming Language</b>";
preg_match_all ("/<b>(.*)<\/b>/U", $userinfo, $pat_array);
print_r($pat_array[0]);
?>

The execution results are as follows Display:

Array
(
    [0] => <b>PHP</b>
    [1] => <b>Programming Language</b>
)

d. preg_replace() function

preg_replace function performs a regular expression search and replacement.

Syntax

mixed preg_replace ( mixed $pattern , mixed $replacement , mixed $subject [, int $limit = -1 [, int &$count ]] )

Search for the part of subject that matches pattern and replace it with replacement.

Parameter description:

$pattern: The pattern to be searched, which can be a string or a string array.

$replacement: String or array of strings used for replacement.

$subject: The target string or string array to be searched and replaced.

$limit: Optional, the maximum number of substitutions for each subject string per pattern. The default is -1 (no limit).

$count: 可选,为替换执行的次数。(用于统计被替换的次数)

返回值

如果 subject 是一个数组, preg_replace() 返回一个数组, 其他情况下返回一个字符串。

如果匹配被查找到,替换后的 subject 被返回,其他情况下 返回没有改变的 subject。如果发生错误,返回 NULL。

实例

将 google 替换为 runoob

<?php
$string = &#39;google 123, 456&#39;;
$pattern = &#39;/(\w+) (\d+), (\d+)/i&#39;;
$replacement = &#39;runoob ${2},$3&#39;;
echo preg_replace($pattern, $replacement, $string);
?>

执行结果如下所示:

runoob 123,456

删除空格字符

<?php
$str = &#39;runo o b&#39;;
$str = preg_replace(&#39;/\s+/&#39;, &#39;&#39;, $str);
// 将会改变为&#39;runoob&#39; echo $str;
?>

执行结果如下所示:

runoob

使用基于数组索引的搜索替换

<?php
$string = &#39;The quick brown fox jumped over the lazy dog.&#39;;
$patterns = array();
$patterns[0] = &#39;/quick/&#39;;
$patterns[1] = &#39;/brown/&#39;;
$patterns[2] = &#39;/fox/&#39;;
$replacements = array();
$replacements[2] = &#39;bear&#39;;
$replacements[1] = &#39;black&#39;;
$replacements[0] = &#39;slow&#39;;
echo preg_replace($patterns, $replacements, $string);
?>

执行结果如下所示:

The bear black slow jumped over the lazy dog.

使用参数 count

<?php
$count = 0;
echo preg_replace(array(&#39;/\d/&#39;, &#39;/\s/&#39;), &#39;*&#39;, &#39;xp 4 to&#39;, -1 , $count);
echo $count; //3
?>

执行结果如下所示:

xp***to
3

e.preg_split() 函数

preg_replace 函数通过一个正则表达式分隔字符串。

语法

array preg_split ( string $pattern , string $subject [, int $limit = -1 [, int $flags = 0 ]] )

通过一个正则表达式分隔给定字符串。

参数说明:

$pattern: 用于搜索的模式,字符串形式。

$subject: 输入字符串。

$limit: 可选,如果指定,将限制分隔得到的子串最多只有limit个,返回的最后一个 子串将包含所有剩余部分。limit值为-1, 0或null时都代表"不限制", 作为php的标准,你可以使用null跳过对flags的设置。

$flags: 可选,可以是任何下面标记的组合(以位或运算 | 组合):

PREG_SPLIT_NO_EMPTY: 如果这个标记被设置, preg_split() 将进返回分隔后的非空部分。

PREG_SPLIT_DELIM_CAPTURE: 如果这个标记设置了,用于分隔的模式中的括号表达式将被捕获并返回。

PREG_SPLIT_OFFSET_CAPTURE: 如果这个标记被设置, 对于每一个出现的匹配返回时将会附加字符串偏移量. 注意:这将会改变返回数组中的每一个元素, 使其每个元素成为一个由第0 个元素为分隔后的子串,第1个元素为该子串在subject 中的偏移量组成的数组。

返回值

返回一个使用 pattern 边界分隔 subject 后得到的子串组成的数组。

实例

获取搜索字符串的部分

<?php
//使用逗号或空格(包含" ", \r, \t, \n, \f)分隔短语
$keywords = preg_split("/[\s,]+/", "hypertext language, programming");
print_r($keywords);
?>

执行结果如下所示:

Array
(
    [0] => hypertext
    [1] => language
    [2] => programming
)

将一个字符串分隔为组成它的字符

<?php
$str = &#39;runoob&#39;;
$chars = preg_split(&#39;//&#39;, $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($chars);
?>

执行结果如下所示:

Array
(
    [0] => r
    [1] => u
    [2] => n
    [3] => o
    [4] => o
    [5] => b
)

分隔一个字符串并获取每部分的偏移量

<?php
$str = &#39;hypertext language programming&#39;;
$chars = preg_split(&#39;/ /&#39;, $str, -1, PREG_SPLIT_OFFSET_CAPTURE);
print_r($chars);
?>

执行结果如下所示:

Array
(
    [0] => Array
        (
            [0] => hypertext
            [1] => 0
        )

    [1] => Array
        (
            [0] => language
            [1] => 10
        )

    [2] => Array
        (
            [0] => programming
            [1] => 19
        )

)

3.常用正则(参考作用)

一、校验数字的表达式 

1 数字:
^[0-9]*$
2 n位的数字:
^\d{n}$
3 至少n位的数字:
^\d{n,}$
4 m-n位的数字:
^\d{m,n}$
5 零和非零开头的数字:
^(0|[1-9][0-9]*)$
6 非零开头的最多带两位小数的数字:
^([1-9][0-9]*)+(.[0-9]{1,2})?$
7 带1-2位小数的正数或负数:
^(\-)?\d+(\.\d{1,2})?$
8 正数、负数、和小数:
^(\-|\+)?\d+(\.\d+)?$
9 有两位小数的正实数:
^[0-9]+(.[0-9]{2})?$
10 有1~3位小数的正实数:
^[0-9]+(.[0-9]{1,3})?$
11 非零的正整数:
^[1-9]\d*$ 或 ^([1-9][0-9]*){1,3}$ 或 ^\+?[1-9][0-9]*$
12 非零的负整数:
^\-[1-9][]0-9"*$ 或 ^-[1-9]\d*$
13 非负整数:
^\d+$ 或 ^[1-9]\d*|0$
14 非正整数:
^-[1-9]\d*|0$ 或 ^((-\d+)|(0+))$
15 非负浮点数:
^\d+(\.\d+)?$ 或 ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$
16 非正浮点数:
^((-\d+(\.\d+)?)|(0+(\.0+)?))$ 或 ^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$
17 正浮点数:
^[1-9]\d*\.\d*|0\.\d*[1-9]\d*$ 或 ^(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*))$
18 负浮点数:
^-([1-9]\d*\.\d*|0\.\d*[1-9]\d*)$ 或 ^(-(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*)))$
19 浮点数:
^(-?\d+)(\.\d+)?$ 或 ^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$ 

二、校验字符的表达式

1 汉字:
^[\u4e00-\u9fa5]{0,}$
2 英文和数字:
^[A-Za-z0-9]+$ 或 ^[A-Za-z0-9]{4,40}$
3 长度为3-20的所有字符:
^.{3,20}$
4 由26个英文字母组成的字符串:
^[A-Za-z]+$
5 由26个大写英文字母组成的字符串:
^[A-Z]+$
6 由26个小写英文字母组成的字符串:
^[a-z]+$
7 由数字和26个英文字母组成的字符串:
^[A-Za-z0-9]+$
8 由数字、26个英文字母或者下划线组成的字符串:
^\w+$ 或 ^\w{3,20}$
9 中文、英文、数字包括下划线:
^[\u4E00-\u9FA5A-Za-z0-9_]+$
10 中文、英文、数字但不包括下划线等符号:
^[\u4E00-\u9FA5A-Za-z0-9]+$ 或 ^[\u4E00-\u9FA5A-Za-z0-9]{2,20}$
11 可以输入含有^%&',;=?$\"等字符:
[^%&',;=?$\x22]+
12 禁止输入含有~的字符:
[^~\x22]+ 

三、特殊需求表达式 

1. Email address:
^\w ([- .]\w )*@\w ([-.]\w )*\.\w ([-.]\w )*$
2, domain name:
[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(/.[a-zA-Z0-9][-a-zA -Z0-9]{0,62}) /.?
3 , InternetURL:
[a-zA-z] ://[^\s]* or ^http://([\w -] \.) [\w-] (/[\w-./?%&=]*)?$
4, mobile phone number:
^(13[0-9]|14[5 |7]|15[0|1|2|3|5|6|7|8|9]|18[0|1|2|3|5|6|7|8|9])\d{8 }$
5, phone number ("XXX-XXXXXXX", "XXXX-XXXXXXXX", "XXX-XXXXXXX", "XXX-XXXXXXXX", "XXXXXXX" and "XXXXXXXX):
^(\(\ d{3,4}-)|\d{3.4}-)?\d{7,8}$
6 Domestic telephone number (0511-4405222, 021-87888822):
\d{3} -\d{8}|\d{4}-\d{7}
7, ID number:
15 or 18-digit ID number:
^\d{15}|\d{ 18}$
15-digit ID card:
^[1-9]\d{7}((0\d)|(1[0-2]))(([0|1|2] \d)|3[0-1])\d{3}$
18-digit ID card:
^[1-9]\d{5}[1-9]\d{3}( (0\d)|(1[0-2]))(([0|1|2]\d)|3[0-1])\d{4}$
8. Short ID number (Ending with numbers and letters x):
^([0-9]){7,18}(x|X)?$
or
^\d{8,18}|[0- 9x]{8,18}|[0-9X]{8,18}?$
9. Is the account legal (starting with a letter, 5-16 bytes allowed, alphanumeric underscores allowed):
^[ a-zA-Z][a-zA-Z0-9_]{4,15}$
10, password (starting with a letter, length between 6~18, can only contain letters, numbers and underscores):
^[a-zA-Z]\w{5,17}$
11, strong password (must contain a combination of uppercase and lowercase letters and numbers, special characters cannot be used, and the length is between 8-10) :
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,10}$
12. Date format:
^ \d{4}-\d{1,2}-\d{1,2}
13. 12 months of the year (01~09 and 1~12):
^(0?[ 1-9]|1[0-2])$
14, 31 days of a month (01~09 and 1~31):
^((0?[1-9])|(( 1|2)[0-9])|30|31)$
15. Input format of money:
16. 1. There are four representations of money that we can accept: "10000.00" and "10,000.00 ", and "10000" and "10,000" without "cent":
^[1-9][0-9]*$
17. 2. This means any number that does not start with 0, However, this also means that a character "0" is not passed, so we use the following form:
^(0|[1-9][0-9]*)$
18, 3. a 0 Or a number that does not start with 0. We can also allow a negative sign at the beginning:
^(0|-?[1-9][0-9]*)$
19, 4. This means A 0 or a number that may be negative and does not start with 0. Let the user start with 0. Also remove the negative sign, because money cannot be negative. What we need to add next is to explain the possible decimal part :
^[0-9] (.[0-9] )?$
20, 5. It must be noted that there should be at least 1 digit after the decimal point, so "10." is not passed , but "10" and "10.2" are passed:
^[0-9] (.[0-9]{2})?$
21. 6. In this way, we stipulate that there must be two decimal points after bit, if you think it is too harsh, you can do this:
^[0-9] (.[0-9]{1,2})?$
22. 7. This allows the user to write only one decimal places. Next we should consider commas in numbers. We can do this:
^[0-9]{1,3}(,[0-9]{3})*(.[0-9] {1,2})?$
23, 8.1 to 3 numbers, followed by any number of commas and 3 numbers, the commas become optional instead of required:
^([0-9] |[0 -9]{1,3}(,[0-9]{3})*)(.[0-9]{1,2})?$
24. Note: This is the final result, don’t Forget that " " can be replaced with "*" if you think an empty string is acceptable (strange, why?) Finally, don't forget to remove the backslash when using the function. Common mistakes are here
25, xml file:
^([a-zA-Z] -?) [a-zA-Z0-9] \\.[x|X][m|M][l|L]$
26. Regular expression of Chinese characters:
[\u4e00-\u9fa5]
27. Double-byte characters:
[^\x00-\xff]
(including Chinese characters) , can be used to calculate the length of a string (the length of a double-byte character is counted as 2, and the length of an ASCII character is counted as 1))
28. Regular expression for blank lines: \n\s*\r (can be used to delete blanks Line)
29, regular expression of HTML tag:
706b83c79d2c696ac46a98098db7b11b]*>.*?c0f8603dd44f0db5dcc943cf687721b3|<.*? /> ; (The version circulating on the Internet is too bad. The above one is only partially effective and is still powerless for complex nested tags)
30. Regular expression for leading and trailing whitespace characters: ^\s*|\s*$ or (^ \s*)|(\s*$) (can be used to delete whitespace characters at the beginning and end of the line (including spaces, tabs, form feeds, etc.), a very useful expression)
31, Tencent QQ number: [1-9][0-9]{4,} (Tencent QQ number starts from 10000)
32, China postal code: [1-9]\d{5}(?!\d) (China postal code is 6 digits)
33. IP address: \d \.\d \.\d \.\d (useful when extracting IP address)

The above is the detailed content of Content summary of PHP regularity (detailed). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete