search
HomeBackend DevelopmentPHP Tutorialphp — PCRE regular expression repetition/quantifier
php — PCRE regular expression repetition/quantifierNov 21, 2016 pm 05:18 PM
phpregular expression

The number of repetitions is specified through a quantifier, which can follow the following elements:

A single character, which can be an escaped

metacharacter.

Character classes

Backreferences (see next section)

Subgroups (unless it is an assertion)

General repetition quantifiers specify the number of matches for a minimum value and a maximum value, wrapped in curly braces Numbers, two numbers separated by commas. Both numbers must be less than 65536, and the first number must be less than or equal to the second. For example: z{2,4} matches "zz", "zzz", "zzzz". A single closing curly brace is not a special character. If the second number is omitted, but the comma is still present, it means there is no upper limit; if the second number and the comma are both omitted, then the quantifier limits a certain number of matches. For example, [aeiou]{3,} matches at least three consecutive vowels, but can also match more, while d{8} can only match 8 numbers. When the left curly bracket appears in a position that does not allow the use of quantifiers or does not match the quantifier syntax, it is considered an ordinary character and itself is matched in the original text. For example, {,6} is not a quantifier and will match the four characters "{,6}" according to the original text.

The quantifier {0} is authorized, and the behavior it will cause is that the preceding term and quantifier do not exist.

For convenience (and historical compatibility), the three most commonly used quantifiers have single-character abbreviations.

Single character quantifier

* Equivalent to {0,}

+ Equivalent to {1,}

? Equivalent to {0,1}

can be followed by a subpattern that does not match any characters Followed by a quantifier that matches 0 or more characters to construct an uncapped infinite loop. For example: (a?)*

Earlier versions of perl and pcre will get an error at compile time for this mode. However, since this can be useful in some cases, this pattern is now accepted, but if any repetition of the subpattern does not match any characters, the loop will be forced out.

By default, quantifiers are "greedy", that is, they will match as many characters as possible (up to the maximum allowed number of matches) without causing pattern matching to fail. A classic example of this problem is trying to match comments in C. Everything that appears between /* and */ is considered a comment, and individual * and / are allowed in the middle of comments. One attempt to match C comments is to use the pattern /*.**/. If you apply this pattern to the string "/* first comment*/ not comment /*second comment*/" it will match the wrong result, also It is the entire string. This is due to the greedy nature of the quantifier, which will try to match as many characters as possible.

However, if a quantifier is immediately followed by a ? (question mark) token, it becomes lazy (non-greedy) mode, which no longer matches as much as possible, but as little as possible. So the pattern /*.*?*/ will work correctly on C comment matching. The meaning of each quantifier itself does not change, but the number of preferred matches changes due to the addition of ?. Do not confuse this use of ? with its use as a quantifier. Because it has two uses, sometimes it will have quantifiers. For example, d??d will be more inclined to match a number, but at the same time, if it is to achieve the purpose of the entire pattern matching, it can also accept the matching of two numbers. Annotation: Taking the pattern wd??dw as an example, for the string "a33a", although d?? is non-greedy, using greedy will cause the entire pattern to not match, so in the end it still selects a number that matches .

If the PCRE_UNGREEDY option is set (an option not available in perl), then the quantifier is non-greedy by default. However, a single quantifier can be made greedy by following it with a ? In other words, the PCRE_UNGREEDY option reverses the greedy default behavior.

The quantifier followed by a “+” means “possession”. It will eat as many characters as possible and does not pay attention to other subsequent patterns. For example, .*abc matches "aabc", but .*+abc will not match, because .*+ will eat the entire string, resulting in the following The remaining patterns are not matched. Since PHP 4.3.3, you can use the possessor (+) to modify quantifiers to improve speed.

When a subgroup is qualified by a quantifier with a minimum quantity greater than 1 or a maximum quantity limit, more storage is required for compiled mode in proportion to the minimum or maximum quantity.

If a pattern starts with .* or .{0,} and the PCRE_DOTALL option is turned on (equivalent to perl's /s), which allows . to match newlines, then the pattern will be implicitly fastened, because no matter what, Each next character position in the target string is tried, so there is never a point at which all matches are retried after the first time. PCRE will treat this pattern the same as A. In order to obtain this optimization when the pattern starts with .* when we know that the target string does not contain a newline, it is worth setting PCRE_DOTALL, or alternatively specifying the anchor explicitly using ^.

Annotation: The optimization here means that after the pattern does not match, it will not look back to find the next position. For example, if PCRE_DOTALL is not set, and the first character of the target string is a newline character, then the pattern will try the first character and find that it does not match. , will try again using the pattern starting from the second character position. After using PCRE_DOTALL, it will definitely match... Similarly, when using ^ or /A, the limitation is that once the pattern does not match, you can exit directly without starting the entire pattern again at the next position of the target string. match.

When a captured subgroup is repeated, the result of the captured subgroup is the value captured in the last iteration. For example, (tweedle[dume]{3}s*)+matches the string "tweedledum tweedledee", and the obtained subgroup capture result is "tweedledee". However, in the case of nested capture subgroups, the corresponding capture values ​​may be set in previous iterations. For example, /(a|(b))+/ matches the string "aba", and the result of the second captured subgroup will be "b". Translator's note: If you don't understand the part after "however", let's use an example to illustrate that b is the last captured result of the second subgroup, so the final result of the second subgroup is b, which is in line with the rules described before "however".


Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
php怎么把负数转为正整数php怎么把负数转为正整数Apr 19, 2022 pm 08:59 PM

php把负数转为正整数的方法:1、使用abs()函数将负数转为正数,使用intval()函数对正数取整,转为正整数,语法“intval(abs($number))”;2、利用“~”位运算符将负数取反加一,语法“~$number + 1”。

php怎么实现几秒后执行一个函数php怎么实现几秒后执行一个函数Apr 24, 2022 pm 01:12 PM

实现方法:1、使用“sleep(延迟秒数)”语句,可延迟执行函数若干秒;2、使用“time_nanosleep(延迟秒数,延迟纳秒数)”语句,可延迟执行函数若干秒和纳秒;3、使用“time_sleep_until(time()+7)”语句。

php怎么除以100保留两位小数php怎么除以100保留两位小数Apr 22, 2022 pm 06:23 PM

php除以100保留两位小数的方法:1、利用“/”运算符进行除法运算,语法“数值 / 100”;2、使用“number_format(除法结果, 2)”或“sprintf("%.2f",除法结果)”语句进行四舍五入的处理值,并保留两位小数。

php字符串有没有下标php字符串有没有下标Apr 24, 2022 am 11:49 AM

php字符串有下标。在PHP中,下标不仅可以应用于数组和对象,还可应用于字符串,利用字符串的下标和中括号“[]”可以访问指定索引位置的字符,并对该字符进行读写,语法“字符串名[下标值]”;字符串的下标值(索引值)只能是整数类型,起始值为0。

php怎么根据年月日判断是一年的第几天php怎么根据年月日判断是一年的第几天Apr 22, 2022 pm 05:02 PM

判断方法:1、使用“strtotime("年-月-日")”语句将给定的年月日转换为时间戳格式;2、用“date("z",时间戳)+1”语句计算指定时间戳是一年的第几天。date()返回的天数是从0开始计算的,因此真实天数需要在此基础上加1。

php怎么读取字符串后几个字符php怎么读取字符串后几个字符Apr 22, 2022 pm 08:31 PM

在php中,可以使用substr()函数来读取字符串后几个字符,只需要将该函数的第二个参数设置为负值,第三个参数省略即可;语法为“substr(字符串,-n)”,表示读取从字符串结尾处向前数第n个字符开始,直到字符串结尾的全部字符。

php怎么替换nbsp空格符php怎么替换nbsp空格符Apr 24, 2022 pm 02:55 PM

方法:1、用“str_replace(" ","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\&nbsp\;||\xc2\xa0)/","其他字符",$str)”语句。

php怎么判断有没有小数点php怎么判断有没有小数点Apr 20, 2022 pm 08:12 PM

php判断有没有小数点的方法:1、使用“strpos(数字字符串,'.')”语法,如果返回小数点在字符串中第一次出现的位置,则有小数点;2、使用“strrpos(数字字符串,'.')”语句,如果返回小数点在字符串中最后一次出现的位置,则有。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.