Pattern modifiers in php regular expressions

We have completed the introduction to regular expression expression through metacharacters and atoms. There are some special situations that we still need to deal with.

How to match if abc is at the beginning of the second line?
I don’t want the regular expression to be particularly greedy in matching all, what should I do if it only matches part of it?

At this time, we need to use the following pattern matching to enhance the regular function.

Commonly used pattern matching characters are:

Pattern matching characters	Function
i	Characters in the pattern will match both uppercase and lowercase letters.
m	Strings are treated as multiple lines
s	Treat the string as a single line, with newlines as ordinary characters.
x	will Whitespace in the pattern is ignored.
A	Force matching only from the beginning of the target string.
D	The dollar metacharacter in the pattern matches only the end of the target string.
U	Matches the nearest string.

The usage of pattern matching characters is as follows:

/ Regular expression expression/pattern matching character

The pattern matching character is placed at the end of this sentence. For example:

/\w+/s

We know the format clearly. The next most important thing is to strengthen the understanding and memory of the use of pattern matching characters. We use code to understand the difference between adding and not adding pattern matches.

i Case-insensitive

<?php //在后面加上了一个i 
    $pattern = '/ABC/i'; 
$string = '8988abc12313';
$string1 = '11111ABC2222'; 
if(preg_match($pattern, $string, $matches)){
     echo '匹配到了，结果为：'; 
    var_dump($matches); }else{
     echo '没有匹配到';
     }
 ?>

Conclusion, both $string and $string1 were matched successfully. Therefore, after adding i at the end, the case of the matching content can be ignored.

m is treated as multiple lines

When matching regular expressions, the target string to be matched is usually treated as one line.

The "start of line" metacharacter (^) only matches the beginning of a string, and the "end of line" metacharacter ($) only matches the end of a string.

When this modifier is set, "line start" and "line end" not only match the beginning and end of the entire string, but also match after and before the newline character in it respectively.

Note: If there is no "\n" character in the string to be matched or there is no ^ or $ in the pattern, setting this modifier has no effect.

Let’s verify this feature through experiments and code:

For the first match, you will find that the match is unsuccessful:

<?php
$pattern = '/^a\d+/';
$string = "我的未来在自己手中我需要不断的努力
a9是一个不错的字符表示
怎么办呢，其实需要不断奋进";
if (preg_match($pattern, $string, $matches)) {
    echo '匹配到了，结果为：';
    var_dump($matches);
} else {
    echo '没有匹配到';
}
?>

For the second match, Let’s try adding m:

<?php
$pattern = '/^a\d+/m';
$string = "我的未来在自己手中我需要不断的努力
a9是一个不错的字符表示
怎么办呢，其实需要不断奋进";
if (preg_match($pattern, $string, $matches)) {
    echo '匹配到了，结果为：';
    var_dump($matches);
} else {
    echo '没有匹配到';
}
?>

Result:

QQ截图20161114141159.png

Oh yeah! The match was successful. /^a\d+/ The matched content is a9, which must be at the beginning of the line. The second line is also matched successfully.

s are recognized as one line

If this modifier is set, the dot metacharacter (.) in the pattern matches all characters , including newlines.

The first time, no pattern matching character s is added:

<?php

$pattern = '/新的未来.+\d+/';

$string = '新的未来
987654321';

if (preg_match($pattern, $string, $matches)) {
   echo '匹配到了，结果为：';
   var_dump($matches);
} else {
   echo '没有匹配到';
}

?>

The second time, the pattern matching character s is added after the regular expression:

<?php

$pattern = '/新的未来.+\d+/s';

$string = "新的未来
987654321";

if (preg_match($pattern, $string, $matches)) {
   echo '匹配到了，结果为：';
   var_dump($matches);
} else {
   echo '没有匹配到';
}

?>

The result is as follows, Match successful!

QQ截图20161114141235.png

Conclusion:

1. Because in the new future, there is a line break after the future

2. And (dot) is Matches all characters except non-whitespace characters. Therefore, the first time was unsuccessful

3. The second time, the s pattern matcher was added. Because, after adding . (dot), it can match all characters.

x Ignore whitespace characters

1. If this modifier is set, all whitespace characters in the pattern will be except those that are escaped or in character classes. be ignored.

2. Characters between the # character outside the unescaped character class and the next newline character are also ignored.

Let’s first experiment with features such as ignoring blank lines:

<?php

$pattern = '/a b c /x';

$string = '学英语要从abc开始';

if (preg_match($pattern, $string, $matches)) {
   echo '匹配到了，结果为：';
   var_dump($matches);
} else {
   echo '没有匹配到';
}

?>

This can also match successfully.

QQ截图20161114141325.png

There are spaces in $pattern, and there is a space after each abc. There are no spaces in $string.
So x ignores whitespace characters.

The second sentence is more difficult to understand literally,

<?php
//重点观察这一行
$pattern = '/a b c #我来写一个注释
/x';

$string = '学英语要从abc开始';

if (preg_match($pattern, $string, $matches)) {
   echo '匹配到了，结果为：';
   var_dump($matches);
} else {
   echo '没有匹配到';
}

?>

The result is also a successful match!

QQ截图20161114141359.png

We found that the second characteristic of x is that it is ignored: the characters between the # character and the next newline character are also ignored.

e Find the matching items and replace them

e pattern is also called reverse reference. The main function is to take out the content in the regular expression brackets and put it into the replacement item to replace the original string.
Preg_replace() must be used before using this pattern matcher.

## mixed preg_replace (mixed $regular match, mixed $replacement, mixed $search string)

The function of preg_replace: use $regular matching item to find $search string variable. Then use the $replacement variable to replace it.

Before we formally explain, let’s review our previous knowledge. We deliberately put brackets around each atom to be matched:

<?php
//加上了括号
$pattern = '/(\d+)([a-z]+)(\d+)/';

$string = '987abc321';

if (preg_match($pattern, $string, $match)) {
   echo '匹配到了，结果为：';
   var_dump($match);

} else {
   echo '没有匹配到';
}
?>

Let’s take a look Result:

QQ截图20161114141456.png

#This is when we talked about brackets before: there are brackets outside the matched content. The content in the brackets will also be placed into the elements of the array. As shown in the picture: 987, abc, 321.

Let’s look at the e pattern in the regular expression:

<?php
$string = "{April 15, 2003}";

//'w'匹配字母，数字和下划线，'d'匹配0-99数字，'+'元字符规定其前导字符必须在目标对象中连续出现一次或多次
$pattern = "/{(\w+) (\d+), (\d+)}/i";

$replacement = "$2";

//字符串被替换为与第 n 个被捕获的括号内的子模式所匹配的文本
echo preg_replace($pattern, $replacement, $string);

?>

Let’s look at the execution results:

QQ截图20161114141532.png
##Conclusion:

In the above example, \$2 points to the first (\d+) represented by the regular expression. It's equivalent to taking out 15 again and when replacing

, I write \$2. The matching items are taken out and used to replace the matching results again.

U Greedy mode control Regular expressions are greedy by default, that is, matching to the maximum extent possible.

Let’s take a look at how greedy the regular expression is:

<?php
$pattern = '/<div>.*<\/div>/';

$string = "<div>你好</div><div>我是</div>";

if (preg_match($pattern, $string, $match)) {
   echo '匹配到了，结果为：';
   var_dump($match);
} else {
   echo '没有匹配到';
}

?>

Let’s take a look at the results and get the following conclusion. It directly matches "Hello

" to "I am

". A maximum match was made.

Let’s add an uppercase U to the same piece of code and see the effect:

<?php
$pattern = '/<div>.*<\/div>/U';

$string = "<div>你好</div><div>我是</div>";

if (preg_match($pattern, $string, $match)) {
   echo '匹配到了，结果为：';
   var_dump($match);
} else {
   echo '没有匹配到';
}

?>

QQ截图20161114141652.png

We found that only the match came out:

<div>你好</div>

Like this , canceling the greedy characteristics of regularity. Let it find the nearest match and it's OK.

A Matches

A starting from the beginning of the target string. This pattern is similar to the ^ (circumflex) effect in metacharacters.

<?php

$pattern = '/this/A';

$string = 'hello this is a ';
//$string1 = 'this is a ';

if (preg_match($pattern, $string, $match)) {
   echo '匹配到了，结果为：';
   var_dump($match);
} else {
   echo '没有匹配到';
}

?>

Conclusion:

1. If $string cannot be matched when adding the A mode modifier, it can be matched without adding it

2. If A is added When using the pattern modifier, $string1 can be matched, because it must be matched from the beginning

D. There is no carriage return after the end of the $ character.

If this modifier is set, dollar metacharacters in the pattern only match the end of the target string. Without this option, the dollar sign will also match before the last character if it is a newline character.

<?php 
$pattern = '/\w+this$/'; 
$pattern1 = '/\w+this$/D'; 
$string = "hellothis "; 
if (preg_match($pattern1, $string, $match)) {
echo '匹配到了，结果为：'; 
var_dump($match);
}else {
echo '没有匹配到'; 
} 
?>

Conclusion:

1. For example, when pattern matches $string, there is a carriage return after the string this of $string. It can also match successfully without adding the D matching character

2. For example, when pattern matches $string, D is added. There is a space after the string this of $string, and the match is unsuccessful.

Next Section

new file

<?php

$pattern = '/this/A';

$string = 'hello this is a ';
//$string1 = 'this is a ';

if (preg_match($pattern, $string, $match)) {
 echo '匹配到了，结果为：';
 var_dump($match);
} else {
 echo '没有匹配到';
}

?>

submit Reset Code

Automatic operation

Full Screen