Summary of regular expressions in Php (1)_PHP tutorial-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

Summary of regular expressions in Php (1)_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 20, 2016 am 11:13 AM

phponedelimiterCanexistletternumberslashyesconceptmodelregulargrammarremoveNo

1. Concept

The syntax pattern is similar to .

The

delimiter can be any non-alphanumeric, non-numeric, non-whitespace

If the delimiter is used in an expression, it needs to be escaped with a backslash .

Metacharacters

Basic components of a regular expression

/Atoms and metacharacters/Pattern modifiers / The power of regular expressions lies in their ability to include selections and loops within patterns. They are encoded in the schema using metacharacters, which do not represent themselves; they are parsed in some special way.

It is divided into two types depending on whether it is inside or outside the square brackets.

1. Metacharacters outside square brackets

Metacharacters (symbols)

元字符(符号)	说明
	一般用于转义字符
^	断言目标的开始位置(或在多行模式下是行首)
$	目标的结束位置(活在多行模式下行尾)
.	匹配除换行符外任何字符(默认时)
[,]	开始,结束字符类定义
\|	开始一个可选分支
( ，)	子组的开始，结尾标记
?	作为量词，表示 0 次或 1 次匹配。位于量词后面用于改变量词的贪婪特性
*	量词，0 次或多次匹配
+	量词，1 次或多次匹配
{ ,}	自定义量词开始标记，结束标记

Description

is generally used to escape characters

元字符	说明
	转义字符
^	仅在作为第一个字符时，表明字符类取反
-	标记字符范围

Assert the start position of the target (or the beginning of the line in multiline mode)

The end position of the target (live at the end of the line in multi-line mode)

matches any character except newline (default)

[,]

Start, end character class definition

Start an optional branch

( ,)

Start and end tag of subgroup

serves as a quantifier, indicating 0 or 1 matches. Located after the quantifier to change the greedy property of the quantifier

Quantifier, 0 or more matches

Quantifier, 1 or more matches

{ ,}

Customized quantifier start tag, end tag

2. The part in the square brackets in the pattern is called "character class"

Metacharacters Description

Escape character

^ Only when used as the first character, indicates that the character class is inverted

- Mark character range

Examples of metacharacter usage

1. Escape (backslash)

is followed by a non-alphanumeric character, canceling any special meaning that character might have. This applies within and outside character classes.

For non-numeric characters, you always need to add a backslash in front of them when matching the original text to indicate that they represent themselves.

When matching "*", because it has a special meaning, use "*" to cancel its special meaning

Match "." with "."

Match "" with "\"

But be careful:

Backslash has special meaning in single-quoted strings and double-quoted strings, so to match a backslash, the pattern must be written "\\" or '\'

2.The second use of backslash provides a means of controlling the visible encoding of non-printing characters. Except for the binary

Always a tab 113

Symbol	Description
a	Ring character (hex 07)
cx	"control-x", x is any character
e	Escape (hex 1B)
f	Page change (hex 0C)
n	Line break (hex 0A)
p{xx} (p	A character that matches the xx attribute
P{xx} (p	A character that does not match the xx attribute
r	Enter (hex 0D)
t	Horizontal tab (hex 09)
xhh	hh hexadecimal encoded character
ddd	ddd octal-encoded character, or backreference
	Another way to use spaces
40	Also considered spaces when less than 40 subgroups are provided.
7	Always a backreference
11	may be a backreference or a tab

A tab followed by a 3 (because only 3 octal digits are read at most at a time
The character represented by octal 113
377	The octal system 377 is the decimal system 255, so it represents a character with all ones
81	A backreference or a binary 0 followed by the two numbers 8 and 1 (because 8 is not an octal significant digit)

3. The third use of backslash, describing a specific character class

符号	说明
d	任意十进制数字
D	任意非十进制数字
h	任意水平空白字符
H	任意非水平空白字符
s	任意空白字符
S	任意空白字符
v	任意垂直空白字符(since PHP 5.2.4)
V	任意非垂直空白字符(since PHP 5.2.4)
w	任意单词字符
W	任意非单词字符

Symbol

Instructions

b	单词边界注意在字符类中是退格
B	非单词边界
A	目标的开始位置(独立于多行模式)
Z	目标的结束位置或结束处的换行符(独立于多行模式)
z	目标的结束位置(独立于多行模式)
G	在目标中首次匹配位置

d Any decimal number D Any non-decimal number h Any horizontal whitespace character span> H Any non-horizontal whitespace character s Any whitespace character S Any whitespace character v Any vertical whitespace character (since PHP 5.2.4) V Any non-vertical whitespace character (since PHP 5.2.4) w Any word character W Any non-word character Each pair of escape sequences above represents two disjoint parts of the complete character set. Any character will definitely match one of them and will not match the other. The fourth usage is simple assertion

Word boundary Note that it is backspace in the character class

Non-word boundaries

Start position of target (independent of multiline mode)

The end position of the target or the newline character at the end (independent of multiline mode)

The end position of the target (independent of multiline mode)

First matching position in target

A,Z,z

Because they always match the beginning and end of the target string and will not be restricted by pattern modifiers

The difference between Z and z is that when the end character of the string is a newline character, Z will regard it as a match at the end of the string, while z only matches the end of the string.

Code 1

<span $p</span>='#\A[a-z]{3}#m'<span ;
</span><span $str</span>='<span abc
defg
hijkl</span>'<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $all</span><span );
</span><span print_r</span>(<span $all</span>);

I found that the result is the same with or without the mode modifier m

only matches

And the code

<span $p</span>='#^[a-z]{3}#m'<span ;
</span><span $str</span>='<span abc
defg
hijkl</span>'<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $all</span><span );
</span><span print_r</span>(<span $all</span>);

Without m

After adding it, it matches

Sure enough,

Similarly,

can be compared

Code

<span <span $p</span>='#[a-z]\Z#'<span ;


</span><span $str</span>="a\n"<span ;


</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $all</span><span );


</span><span print_r</span>(<span $all</span>);</span>

When the

pattern is corrected to E

When the pattern is modified to

In a preg_match()() call with the $offset parameter specified, it is successful only when the current matching position is at the matching start point

When the value of $offset is not 0, it is different from A.

See php manual

That is, put the characters with special meaning between Q and E

Such as code 4

<span $p</span>='#\w+\Q.$.\E$#'<span ;
</span><span $str</span>="a.$."<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $all</span><span );
</span><span print_r</span>(<span $all</span>);

matches a.$.

As of PHP 5.2.4. . For example, footKbar matches "footbar". But the matching result obtained is "bar". However, the use of K will not interfere with the content within the subgroup. For example, if (foot)Kbar matches "footbar", the result in the first subgroup will still be "foo". Translator's Note: The effect of placing K in the subgroup and outside the subgroup is the same.

p{Lu} matches uppercase letters

Period

Outside of character class

C can be used to match single bytes, which means that in UTF-8 mode, periods can match multi-byte characters

Character classes
alnum	Letters and numbers
alpha	letters
ascii	0 - 127 ascii characters
blank	Spaces and horizontal tabs
cntrl	Control characters
digit	Decimal number (same as d)
graph	Print characters, excluding spaces
lower	Lower case letters
print	Print characters, including spaces
punct	Print characters, excluding letters and numbers
space	White space characters (more vertical tabs than s)
upper	Capital letters
word	Word characters (same as w)
xdigit	Hexadecimal numbers

比如'#[[:upper:]]#'匹配大写字母

'#[[:alpha:]]#' 匹配字母

竖线字符用于分离模式中的可选路径。比如模式gilbert|Sullivan匹配 ”gilbert” 或者 ”sullivan”。竖线可以在模式中出现任意多个，并且允许有空的可选路径(匹配空字符串)。匹配的处理从左到右尝试每一个可选路径，并且使用第一个成功匹配的。如果可选路径在子组(下面定义)中，则”成功匹配”表示同时匹配了子模式中的分支以及主模式中的其他部分。

代码5

<span $p</span>='#p(hp|ython|erl)#'<span ;
</span><span $str</span>="php python perl"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $all</span><span );
</span><span print_r</span>(<span $all</span>);

子组通过圆括号分割界定，并且它们可以嵌套，主要有以下两种用法与功能

1.将可选分支局部化。比如

模式 p(hp|ython|erl) 匹配

2.将子组设定为捕获子组。

整个模式匹配后，左括号从左至右出现的次序就是对应子组的下标(从 1 开始)，可以通过这些下标数字来获取捕获子模式匹配结果。

代码6

<span $p</span>='#(\d)#'<span ;
</span><span $str</span>="abc123"<span ;
</span><span $r</span>=<span preg_replace</span>(<span $p</span>,'<font color=red>\1</font>',<span $str</span><span );
</span><span echo</span> <span $r</span>;

在子组定义的左括号后面紧跟字符串 ”?:” 会使得该子组不被单独捕获，并且不会对其后子组序号的计算产生影响

代码7：匹配数字把数字改为红色的

<span $p</span>='#.*(?:\d).*([a-z])#U'<span ;
</span><span $str</span>="3df5g"<span ;
</span><span $r</span>=<span preg_replace</span>(<span $p</span>,'<font color=red>\1</font>',<span $str</span><span );
</span><span echo</span> <span $r</span>;

如果匹配数字的模式不加?:

那么

为了方便简写，如果需要在非捕获子组开始位置设置选项，，比如：

上面两种写法实际上是相同的模式。因为可选分支会从左到右尝试每个分支，并且选项没有在子模式结束前被重置，并且由于选项的设置会穿透对后面的其他分支产生影响，因此，上面的模式都会匹配 ”SUNDAY” 以及 ”Saturday”。

在 PHP 4.3.3 中，pattern) 的语法进行命名。这个子模式将会在匹配结果中同时以其名称和顺序(数字下标)出现， PHP 5.2.2中又增加了两种味子组命名的语法： (?pattern) 和 (?’name’pattern)。

代码如下8：

<span $p</span>="#.*(?<alpha>[a-z]{3})(?'digit'\d{3}).*#"<span ;
</span><span $str</span>="abc123111def111g"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

结果：

有时需要多个匹配可以在一个正则表达式中选用子组。为了让多个子组可以共用一个后向引用数字的问题， (?\语法允许复制数字。考虑下面的正则表达式匹配Sunday：

(?:(Sat)ur|(Sun))day

这里当后向引用 1 空时Sun 存储在后向引用 2 中. 当后向引用 2 不存在的时候 Sat 存储在后向引用 1中。使用 (?|修改模式来修复这个问题：

代码9：

<span $p</span>='#(?:(sat)ur|(sun))day#'<span ;
</span><span $str</span>="sunday saturday"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

结果：

(?|(Sat)ur|(Sun))day

使用这个模式， Sun和Sat都会被存储到后向引用1中。

在看这个模式前先看以2个下代码

代码10-1

$p=<span '</span><span #(a|b)\d#</span><span '</span><span ;
$str</span>=<span "</span><span b2a1</span><span "</span><span ;
preg_match_all($p,$str,$arr);
print_r($arr);</span>

结果是：Array

<em id="__mceDel">(
    [0] => Array
        (
            [0] => b2
            [1] => a1
        )

    [1] => Array
        (
            [0] => b
            [1] => a
        )

)<br /><span <strong>代码10-2</strong></span><br /></em>

$p=<span '</span><span #((a)|b)\d#</span><span '</span><span ;
$str</span>=<span "</span><span b2a1</span><span "</span><span ;
preg_match_all($p,$str,$arr);
print_r($arr);</span>

结果：

<strong>Array
(
    [0] => Array
        (
            [0] => b2
            [1] => a1
        )

    [1] => Array
        (
            [0] => b
            [1] => a
        )

    [2] => Array
        (
            [0] => 
            [1] => a
        )</strong>

)<br /><strong>对10-2代码：<br /></strong>第一次完整匹配到的内容是b2,所以包括匹配内容b的括号即为其第一个子模式是即为b，第二个子模式由于(a)没有匹配，所以为空<br />第二次完整匹配到a1,其第一个子模式为a,第二次的由于((a)|b)是外层大括号里包含的<br /><strong>代码10-3：<br /></strong>

$p=<span '</span><span #((a)|(b))\d#</span><span '</span><span ;
$str</span>=<span "</span><span b2a1</span><span "</span><span ;
preg_match_all($p,$str,$arr);
print_r($arr);</span>

<strong> 结果：<br /></strong>

Array
(
    [0] => Array
        (
            [0] => b2
            [1] => a1
        )

    [1] => Array
        (
            [0] => b
            [1] => a
        )

    [2] => Array
        (
            [0] => 
            [1] => a
        )

    [3] => Array
        (
            [0] => b
            [1] => 
        )

)

代码10-4：

$p=<span '</span><span #(?:(a)|(b))\d#</span><span '</span><span ;
$str</span>=<span "</span><span b2a1</span><span "</span><span ;
preg_match_all($p,$str,$arr);
print_r($arr);</span>

结果：<br />Array
(
    [0] => Array
        (
            [0] => b2
            [1] => a1
        )

    [1] => Array
        (
            [0] => 
            [1] => a
        )

    [2] => Array
        (
            [0] => b
            [1] => 
        )

)

<strong> </strong>

代码10：

<span $p</span>='#(?|(sat)ur|(sun))day#'<span ;
</span><span $str</span>="sunday saturday"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

结果

如果紧跟反斜线的数字小于 10，它总是一个后向引用。模式中的捕获数要大于等于后向引用的个数

后向引用会直接匹配被引用捕获组在目标字符串中实际捕获到的内容，而不是匹配子组模式的内容

(sens|respons)e and \1ibility将会匹配

<span $p</span>='#(sens|respons)e and \1ibility#'<span ;
</span><span $str</span>="sense and sensibility response and responsibility  sense and responsibility"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

结果

ab(?i)c匹配abC

如果在后向引用时被强制进行了大小写敏感匹配

((?i)abc)\s+\1

匹配

ABC ABC

AbC AbC

只要两个一样不分大小写

但不匹配

这里其实要考虑的是后向引用期望得到的内容是和那个被引用的捕获子组得到的内容是完全一致的

代码12：

<span $p</span>='#((?i)abc)\s+\1#'<span ;
</span><span $str</span>="abc abc |ABC ABC  |AbC AbC |abc Abc "<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

结果

先看以下代码13

<span $p</span>='#(a|(bc))#'<span ;
</span><span $str</span>="abc "<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

完整匹配了2次

[0][0]是第一次完整的匹配

[1][0]是第一次匹配的第一个子模式

[2][0]是第一次匹配的第二个子模式

[0][1]第二次完整匹配

[1][1]第二次匹配的第一个子模式

[2][1]是第二次匹配的第二个子模式

从上面可以发现对于模式

(a|(bc))

最外面的括号是第一个匹配子模式

里面的括号里的是第二个子模式

所以对于以下代码14：

<span $p</span>='#(a|(bc))\2#'<span ;
</span><span $str</span>="aabcbc"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

结果

当第一匹配

就无从

所以第一次完整匹配中必须得有让第二个子模式存在的机会即里面的括号里的内容必须被匹配到，所以必须得有

因为可能会有多达 99 个后向引用，所有紧跟反斜线后的数字都可能是一个潜在的后向引用计数。如果模式在后向引用之后紧接着还是一个数值字符，那么必须使用一些分隔符用于终结后向引用语法。

以下代码15为例：

<span $p</span>='#([a-z]{3})\1 5#x'<span ;
</span><span $str</span>="aaaaaa5"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

模式后向引用\1

我们空下一格，然后在模式修正里忽略模式里的空格就能成功匹配

(a\1) 就不会得到任何匹配

而这种引用可以用于内部的子模式重复

(a|b\1)会匹配 ”a”但不会匹配b( 因为子组内部有一个可选路径，可选路径中有一条路能够完成匹配，在匹配完成后，后向引用就能够引用到内容了)。

代码16：

<span $p</span>='#(a|b\1)+#'<span ;
</span><span $str</span>="abba"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

结果

在每次子模式的迭代过程中，后向引用匹配上一次迭代时这个子组匹配到的字符串。为了做这种工作，模式必须满足这样一个条件，模式在第一次迭代的时候，必须能够保证不需要匹配后向引用。这种条件可以像上面的例子用可选路径来实现，也可以通过使用最小值为 0 的量词修饰后向引用的方式来完成。

在 PHP 5.2.2之后， g转义序列可以用于子模式的绝对和相对引用。这个转义序列必须紧跟一个无符号数字或一个负数，可以选择性的使用括号对数字进行包裹。序列\1， \g1，\g{1} 之间是同义词关系。这种用法可以消除使用反斜线紧跟数值描述反向引用时候产生的歧义。这种转义序列有利于区分后向引用和八进制数字字符，也使得后向引用后面紧跟一个原文匹配数字变的更明了，比如 \g{2}1。

代码17：

<span $p</span>='#([a-z]{2})\g{1}5#'<span ;
</span><span $str</span>="abab5"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

可与代码15对比

\g 转义序列紧跟一个负数代表一个相对的后向引用。比如： (foo)(bar)\g{-1} 可以匹配字符串 ”foobarbar”(foo)(bar)\g{-2} 可以匹配 ”foobarfoo”。这在长的模式中作为一个可选方案，用来保持对之前一个特定子组的引用的子组序号的追踪。

代码18

<span $p</span>='#(foo)(bar)\g{-1}#'<span ;
</span><span $p1</span>='#(foo)(bar)\g{-2}#'<span ;
</span><span $str</span>="foobarbar"<span ;
</span><span $str1</span>="foobarfoo"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span preg_match_all</span>(<span $p1</span>,<span $str1</span>,<span $arr1</span><span );
</span><span print_r</span>(<span $arr</span><span );
</span><span print_r</span>(<span $arr1</span>);

结果：

后向引用也支持使用子组名称的语法方式描述, 比如 (?P=name) 或者 PHP 5.2.2 开始可以实用\k 或 \k’name’。另外在 PHP 5.2.4 中加入了对\k{name} 和 \g{name} 的支持。

代码19：

<span $p</span>="#(?<span 'alpha'</span>[a-z]{2})(?<digt>[0-9]{3})\k<digt>(?<span P=alpha</span>)#"<span ;
</span><span $str</span>="aa123123aa"<span ;
</span><span preg_match_all</span>(<span $p</span>,<span $str</span>,<span $arr</span><span );
</span><span print_r</span>(<span $arr</span>);

结果：

可与代码8比较着看

注意标红的

Alpha

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

PHP's Current Status: A Look at Web Development TrendsApr 13, 2025 am 12:20 AM

PHP remains important in modern web development, especially in content management and e-commerce platforms. 1) PHP has a rich ecosystem and strong framework support, such as Laravel and Symfony. 2) Performance optimization can be achieved through OPcache and Nginx. 3) PHP8.0 introduces JIT compiler to improve performance. 4) Cloud-native applications are deployed through Docker and Kubernetes to improve flexibility and scalability.

PHP vs. Other Languages: A ComparisonApr 13, 2025 am 12:19 AM

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

PHP vs. Python: Core Features and FunctionalityApr 13, 2025 am 12:16 AM

PHP and Python each have their own advantages and are suitable for different scenarios. 1.PHP is suitable for web development and provides built-in web servers and rich function libraries. 2. Python is suitable for data science and machine learning, with concise syntax and a powerful standard library. When choosing, it should be decided based on project requirements.

PHP: A Key Language for Web DevelopmentApr 13, 2025 am 12:08 AM

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP: The Foundation of Many WebsitesApr 13, 2025 am 12:07 AM

The reasons why PHP is the preferred technology stack for many websites include its ease of use, strong community support, and widespread use. 1) Easy to learn and use, suitable for beginners. 2) Have a huge developer community and rich resources. 3) Widely used in WordPress, Drupal and other platforms. 4) Integrate tightly with web servers to simplify development deployment.

Beyond the Hype: Assessing PHP's Role TodayApr 12, 2025 am 12:17 AM

PHP remains a powerful and widely used tool in modern programming, especially in the field of web development. 1) PHP is easy to use and seamlessly integrated with databases, and is the first choice for many developers. 2) It supports dynamic content generation and object-oriented programming, suitable for quickly creating and maintaining websites. 3) PHP's performance can be improved by caching and optimizing database queries, and its extensive community and rich ecosystem make it still important in today's technology stack.

What are Weak References in PHP and when are they useful?Apr 12, 2025 am 12:13 AM

In PHP, weak references are implemented through the WeakReference class and will not prevent the garbage collector from reclaiming objects. Weak references are suitable for scenarios such as caching systems and event listeners. It should be noted that it cannot guarantee the survival of objects and that garbage collection may be delayed.

Explain the __invoke magic method in PHP.Apr 12, 2025 am 12:07 AM

The \_\_invoke method allows objects to be called like functions. 1. Define the \_\_invoke method so that the object can be called. 2. When using the $obj(...) syntax, PHP will execute the \_\_invoke method. 3. Suitable for scenarios such as logging and calculator, improving code flexibility and readability.

See all articles