Home >Backend Development >PHP Tutorial >PHP—PCRE regular expression subgroup (subpattern)
Subgroups are delimited by parentheses, and they can be nested. Marking a part of a pattern as a subgroup (subpattern) mainly does two things:
Localize optional branches. For example, the pattern cat(arcat|erpillar|) matches one of "cat", "cataract", and "caterpillar". If there are no parentheses, it matches "cataract", "erpillar" and the empty string.
Set the subgroup as the capturing subgroup (defined above). When the entire pattern is matched, the part of the target string that matches the subgroup will be returned to the caller through the ovector parameter of pcre_exec()(). The order in which the left brackets appear from left to right is the subscript of the corresponding subgroup (starting from 1). The capturing subpattern matching results can be obtained through these subscript numbers.
For example, if the string "the red king" is matched using the pattern ((red|white) (king|queen)), the pattern matching result is array("red king", "red king", "red" , "king"), where the 0th element is the result of the entire pattern match, and the following three elements are the results of the three subgroup matches. Their tables below are 1, 2, and 3 respectively.
In fact, the two functions performed by parentheses are not always useful. Often we have a need to use subgroups for grouping, but we don't need to capture them (individually). Following the string "?:" immediately after the left bracket of the subgroup definition will prevent the subgroup from being captured separately and will not affect the calculation of the sequence number of the subsequent subgroups. For example, if the string "the white queen" matches the pattern ((?:red|white) (king|queen)), the matched result will be array("white queen", "white queen", "white queen") , the two subgroups of and king|queen. The maximum number of captured subgroups is 99, and the maximum number of all subgroups allowed (including captured and non-captured) is 200.
For the convenience of abbreviation, if you need to set the option at the beginning of the non-capturing subgroup, the option letter can be between ? and :, for example:
(?i:saturday|sunday) (?:(?i)saturday|sunday)
The above two ways of writing are actually the same pattern. Because the optional branches try each branch from left to right, and because the options are not reset before the end of the submode, and because the setting of the options will affect other subsequent branches, the above pattern will match." SUNDAY" and "Saturday".
In PHP 4.3.3, you can name subgroups using the (?P8a11bc632ea32a57b3e3693c7987c420pattern) syntax. This subpattern will appear in the matching results with both its name and order (numeric subscript). PHP 5.2.2 adds two more syntaxes for naming subpatterns: (?8a11bc632ea32a57b3e3693c7987c420pattern) and (? 'name'pattern).
Sometimes you need multiple matches and you can use subgroups in a regular expression. To allow multiple subgroups to share a backreference number, the (?| syntax allows for copying the number. Consider the following regular expression matching Sunday:
(?:(Sat)ur|(Sun))day
Here Sun is stored in backreference 2 when backreference 1 is empty . When backreference 2 does not exist, Sat is stored in backreference 1. Use the (?| modification pattern to fix this problem:
(?|(Sat)ur|(Sun))day
Using this pattern, both Sun and Sat will be stored in backreference 1.