Perl regular expressions
Regular expression (regular expression) describes a string matching pattern, which can be used to check whether a string contains a certain substring, replace the matching substring, or extract the matching substring from a certain string. A substring of a certain condition, etc.
The regular expression function of the Perl language is very powerful, basically the most powerful among commonly used languages. Many languages refer to Perl's regular expressions when designing regular expression support.
Perl’s three forms of regular expressions are matching, replacement and transformation:
Matching: m/
Replacement: s/
Conversion: tr/
These three forms are generally the same as =~ or !~ Used together, =~ means matching, !~ means not matching.
Matching operator
Matching operator m// is used to match a string statement or a regular expression, for example, to match "run" in the scalar $bar, The code is as follows:
#!/usr/bin/perl $bar = "I am php site. welcome to php site."; if ($bar =~ /run/){ print "第一次匹配\n"; }else{ print "第一次不匹配\n"; } $bar = "run"; if ($bar =~ /run/){ print "第二次匹配\n"; }else{ print "第二次不匹配\n"; }
Execute the above program, the output result is:
第一次匹配 第二次匹配
Pattern matching modifiers
Pattern matching has some commonly used modifiers, as shown in the following table :
Modifier | Description |
---|---|
Ignore case in pattern | |
Multi-line mode | |
Assign value only once | |
Single line mode, "." matches "\n" (default does not match) | |
Ignore pattern The white space | |
Global matching | |
Allows finding matches again after global matching fails string |
Description | |
---|---|
If "i" is added to the modifier, the regular expression will cancel the case sensitivity, that is, "a" and "A" are the same. | |
The default regular start "^" and end "$" are only for regular strings. If "m" is added to the modifier, then the start and end The end will refer to each line of the string: the beginning of each line is "^" and the end is "$". | |
The expression is executed only once. | |
If "s" is added to the modifier, the default "." representing any character other than a newline character will become any character. That is, including line breaks! | |
If this modifier is added, whitespace characters in the expression will be ignored unless it has been escaped. | |
Replace all matching strings. | |
Replace string as expression |
Conversion operatorThe following are the modifiers related to the conversion operator:
Description | |
---|---|
Convert all unspecified characters | |
Delete all specified characters | |
Multiple identical output characters are shortened into one |
Expression | Description |
---|---|
. | Matches all characters except newlines |
x? | Matches 0 or once x string |
x* | Match x string 0 or more times, but match the minimum possible number of times |
x+ | Match 1 time or multiple x strings, but match the minimum number of times possible |
.* | Match any character 0 or more times |
.+ | Matches any character 1 or more times |
{m} | Matches exactly m specified characters String |
{m,n} | Matches more than m and less than n specified strings |
{m ,} | Match more than m specified strings |
[] | Match the characters within [] |
[^] | Matches characters that do not match [] |
[0-9] | Matches all numeric characters |
[a-z] | Matches all lowercase alphabetic characters |
[^0-9] | matches All non-numeric characters |
[^a-z] | Matches all non-lowercase alphabetic characters |
^ | Match the characters at the beginning of the character |
$ | Match the characters at the end of the character |
\d | Match A numeric character, the same syntax as [0-9] |
\d+ | Matches multiple numeric strings, the same syntax as [0-9]+ |
\D | If it is not a number, the same as for other things\d |
If it is not a number, the same for other things\ d+ | |
A string of English letters or numbers, the same syntax as [a-zA-Z0-9] | |
The syntax is the same as [a-zA-Z0-9]+ | |
non-English letters or numbers String, the syntax is the same as [^a-zA-Z0-9] | |
The syntax is the same as [^a-zA-Z0-9]+ | |
Space, the same syntax as [\n\t\r\f] | |
Same as [\n\t\r\f]+ | ##\S |
\S+ | |
\b | |
\B | |
##a|b|c | |
abc | |
/pattern/i | |