Home >Backend Development >PHP Tutorial >Detailed explanation of regular expression pattern modifiers in PHP_PHP tutorial
Modifiers are an important reference for regular expressions in PHP. Now I will give you a detailed explanation of regular expression pattern modifiers in PHP. Friends who need to know more can go to the reference.
PHP pattern modifiers, also called pattern modifiers, are used in addition to the delimiters of regular expressions. It is mainly used to adjust the interpretation of regular expressions, expand some functions of regular expressions in matching, replacement and other operations, and enhance the capabilities of regular expressions. However, the explanations in many places are wrong and can easily mislead others, so today I compiled this document for your reference.
Mode correction symbol Function description
i is case-insensitive when matching regular expressions
m Treats a string as multiple lines. The default regular starting "^" and ending "$" treats the target string as a single "line" of characters (even if it includes a newline character). If "m" is added to the modifier, then the start and end of each line of the string will be pointed at the beginning with "^" and the end with "$".
s If this modifier is set, the matched string will be viewed as one line, including newlines, and newlines will be treated as ordinary strings.
x Ignore whitespace unless escaped.
e is only used in the preg_replace() function to perform normal replacement on the back reference in the replacement string, evaluate it as PHP code, and use its result to replace the searched string.
A If this modifier is used, the expression must be at the beginning of the matched string. For example, "/a/A" matches "abcd".
The $ character in D pattern matches the end of the target character. Without this option, the dollar sign will also match before the last character if it is a newline character. This is ignored if the modifier m is set.
E is the opposite of "m". If this modifier is used, "$" will match the absolute end of the string, not before the newline character. This mode is turned on by default.
U Greedy mode has the same effect as question mark. The maximum matching is greedy mode.
Greedy mode:
For example, if we want to match a string that starts with the letter "a" and ends with the letter "b", but the string that needs to be matched contains many "b"s after "a", such as "a bbbbbbbbbbbbbbbbb", then the regular expression Will it match the first "b" or the last "b"? If you use greedy mode, the last "b" will be matched, otherwise only the first "b" will be matched.
Examples of using PHP regular expression greedy mode:
1./a.+?b/
2./a.+b/U
Compare the examples without using greedy mode as follows:
1./a.+b/
A modifier U is used above. For details, see the introduction to modifiers.
Other information:
Pattern modifiers: Explain the modifiers used in regular expression patterns
Note: The modifiers currently available in PCRE are listed below. In parentheses are the internal PCRE names of these modifiers. Spaces and newlines in modifiers are ignored, other characters will cause errors.
i (PCRE_CASELESS)
If this modifier is set, characters in the pattern will match both uppercase and lowercase letters.
m(PCRE_MULTILINE)
By default, PCRE treats the target string as a single "line" of characters (even if it contains newlines). The "start of line" metacharacter (^) only matches the beginning of the string, and the "end of line" metacharacter ($) only matches the end of the string, or the last character before it if it is a newline (unless D is set) modifier). This is the same as Perl.
When this modifier is set, "line start" and "line end" match in addition to the beginning and end of the entire string, they also match after and before the newline character in it. This is equivalent to Perl's /m modifier. If there are no "n" characters in the target string or ^ or $ in the pattern, setting this modifier has no effect.
s(PCRE_DOTALL)
If this modifier is set, the dot metacharacter (.) in the pattern matches all characters, including newlines. Without this setting, newline characters are not included. This is equivalent to Perl's /s modifier. Excluded character classes such as [^a] always match newlines, regardless of whether this modifier is set.
x(PCRE_EXTENDED)
If this modifier is set, whitespace characters in the pattern are completely ignored except those that are escaped or within a character class, between a # outside of an unescaped character class and the next newline character All characters, including both ends, are also ignored. This is equivalent to Perl's /x modifier, allowing comments to be added to complex patterns. Note, however, that this only applies to data characters. Whitespace characters may never appear in special character sequences within a pattern, such as sequences that introduce a conditional subpattern (?( in the middle.
e
If this modifier is set, preg_replace() performs the normal replacement of the backreference in the replacement string, evaluates it as PHP code, and uses its result to replace the searched string.
Only preg_replace() uses this modifier, other PCRE functions ignore it.
Note: This modifier is not available in PHP3.
A(PCRE_ANCHORED)
If this modifier is set, the pattern is forced to be "anchored", that is, it is forced to match only from the beginning of the target string. This effect can also be achieved with the appropriate mode itself (the only way this is achieved in Perl).
D(PCRE_DOLLAR_ENDONLY)
If this modifier is set, dollar metacharacters in the pattern only match the end of the target string. Without this option, if the last character is a newline character, the dollar sign will also match before this character (but not before any other newline character). This option is ignored if the m modifier is set. There is no equivalent modifier in Perl.
S
When a pattern is going to be used several times, it is worth analyzing it first to speed up matching. If this modifier is set additional analysis will be performed. Currently, analyzing a pattern is only useful for non-anchored patterns that do not have a single fixed starting character.
U(PCRE_UNGREEDY)
This modifier inverts the value of the match count so that it is not repeated by default, but becomes repeated when followed by a "?" This is not compatible with Perl. This option can also be enabled by setting the (?U) modifier in the pattern or by following the quantifier with a question mark (e.g. .*?).
X(PCRE_EXTRA)
This modifier enables an additional feature in PCRE that is not compatible with Perl. Any backslash in the pattern followed by a letter with no special meaning results in an error, thus preserving this combination for future expansion. By default, like Perl, a backslash followed by a letter with no special meaning is treated as the letter itself. No other traits are currently controlled by this modifier.
u(PCRE_UTF8)
This modifier enables an additional feature in PCRE that is not compatible with Perl. Pattern strings are treated as UTF-8. This modifier is available since PHP 4.1.0 under Unix and since PHP 4.2.3 under win32. Patterns are checked for UTF-8 validity since PHP 4.3.5.
Basic syntax of PHP regular expressions:
A regular expression is divided into three parts: delimiter, expression and modifier.
The delimiter can be any character except special characters (such as "/!", etc.), and the commonly used delimiter is "/". The expression consists of some special characters (see special characters below) and non-special strings. For example, "[a-z0-9_-]+@[a-z0-9_-.]+" can match a simple electron Mail string. Modifiers are used to turn on or off a certain function/mode. Here is an example of a complete regular expression:
/hello.+?hello/is The above regular expression "/" is the delimiter, the one between the two "/" is the expression, and the string "is" after the second "/" is the modification symbol.
If there is a delimiter in the expression, you need to use the escape symbol "", such as "/hello.+?/hello/is". In addition to being used as delimiters, escape symbols can also execute special characters. All special characters composed of letters need to be escaped with "", such as "d" representing all numbers.