Perl regular expressions


Regular expression (regular expression) describes a string matching pattern, which can be used to check whether a string contains a certain substring, replace the matching substring, or extract the matching substring from a certain string. A substring of a certain condition, etc.

The regular expression function of the Perl language is very powerful, basically the most powerful among commonly used languages. Many languages ​​refer to Perl's regular expressions when designing regular expression support.

Perl’s three forms of regular expressions are matching, replacement and transformation:

  • Matching: m/

  • Replacement: s/

  • Conversion: tr/

These three forms are generally the same as =~ or !~ Used together, =~ means matching, !~ means not matching.


Matching operator

Matching operator m// is used to match a string statement or a regular expression, for example, to match "run" in the scalar $bar, The code is as follows:

#!/usr/bin/perl

$bar = "I am php site. welcome to php site.";
if ($bar =~ /run/){
   print "第一次匹配\n";
}else{
   print "第一次不匹配\n";
}

$bar = "run";
if ($bar =~ /run/){
   print "第二次匹配\n";
}else{
   print "第二次不匹配\n";
}

Execute the above program, the output result is:

第一次匹配
第二次匹配

Pattern matching modifiers

Pattern matching has some commonly used modifiers, as shown in the following table :

##iIgnore case in pattern mMulti-line modeoAssign value only oncesSingle line mode, "." matches "\n" (default does not match) xIgnore pattern The white space gGlobal matchingcgAllows finding matches again after global matching fails string

Regular expression variable

After perl is processed, there will be three special variable names for the matched value:

  • $` : The previous part of the matching string

  • ##$&: The matching string

  • $': The remaining string that has not yet been matched

If you put these three variables together, you will get the original string.

The example is as follows:

#!/usr/bin/perl

$string = "welcome to php site.";
$string =~ m/run/;
print "匹配前的字符串: $`\n";
print "匹配的字符串: $&\n";
print "匹配后的字符串: $'\n";

The output result of executing the above program is:

匹配前的字符串: welcome to 
匹配的字符串: run
匹配后的字符串: oob site.


Replacement operator

Replacement operator s/// Yes An extension of the match operator that replaces the specified string with a new string. The basic format is as follows:

s/PATTERN/REPLACEMENT/;

PATTERN is the matching pattern, and REPLACEMENT is the replacement string.

For example, we replace "google" in the following string with "php":

#!/usr/bin/perl

$string = "welcome to google site.";
$string =~ s/google/php/;

print "$string\n";

The output result of executing the above program is:

welcome to php site.

Replacement operation modifier

The replacement operation modifiers are shown in the following table:

ModifierDescription
ModifierDescriptioniIf "i" is added to the modifier, the regular expression will cancel the case sensitivity, that is, "a" and "A" are the same. mThe default regular start "^" and end "$" are only for regular strings. If "m" is added to the modifier, then the start and end The end will refer to each line of the string: the beginning of each line is "^" and the end is "$". oThe expression is executed only once. sIf "s" is added to the modifier, the default "." representing any character other than a newline character will become any character. That is, including line breaks! xIf this modifier is added, whitespace characters in the expression will be ignored unless it has been escaped. gReplace all matching strings. eReplace string as expression

Conversion operator

The following are the modifiers related to the conversion operator:

ModifierDescriptioncConvert all unspecified charactersdDelete all specified characterss Multiple identical output characters are shortened into one

The following example converts all lowercase letters in the variable $string to uppercase letters:

#!/usr/bin/perl 

$string = 'welcome to php site.';
$string =~ tr/a-z/A-z/;

print "$string\n";

The output result of executing the above program is:

WELCOME TO php SITE.

The following example uses /s to repeat the variable $string Character deletion:

#!/usr/bin/perl

$string = 'php';
$string =~ tr/a-z/a-z/s;

print "$string\n";

The output result of executing the above program is:

runob

More examples:

$string =~ tr/\d/ /c;     # 把所有非数字字符替换为空格
$string =~ tr/\t //d;     # 删除tab和空格
$string =~ tr/0-9/ /cs    # 把数字间的其它字符替换为一个空格。

More regular expression rules

##\D+If it is not a number, the same for other things\ d+##\w\w+\W\W+\s\s+##\S non-space, same as [^\n\t\r \f] The syntax is the same\S+The syntax is the same as [^\n\t\r\f]+\b Matches strings that are bounded by English letters and numbers \B Matches characters that are not bounded by English letters and numbers String Matches a string that matches a character, b character, or c character matches strings containing abc (pattern) () This symbol will remember the found string, which is a very practical syntax. The string found in the first () becomes the $1 variable or the \1 variable, and the second () ) becomes the $2 variable or the \2 variable, and so on. i This parameter means to ignore English case, that is, when matching strings, English case issues are not considered. \ If you want to find a special character in the pattern pattern, such as "*", you must add the \ symbol before the character, so that the special character will be invalid
ExpressionDescription
. Matches all characters except newlines
x?Matches 0 or once x string
x*Match x string 0 or more times, but match the minimum possible number of times
x+Match 1 time or multiple x strings, but match the minimum number of times possible
.*Match any character 0 or more times
.+ Matches any character 1 or more times
{m} Matches exactly m specified characters String
{m,n} Matches more than m and less than n specified strings
{m ,}Match more than m specified strings
[]Match the characters within []
[^] Matches characters that do not match []
[0-9] Matches all numeric characters
[a-z] Matches all lowercase alphabetic characters
[^0-9] matches All non-numeric characters
[^a-z] Matches all non-lowercase alphabetic characters
^ Match the characters at the beginning of the character
$ Match the characters at the end of the character
\d Match A numeric character, the same syntax as [0-9]
\d+ Matches multiple numeric strings, the same syntax as [0-9]+
\DIf it is not a number, the same as for other things\d
A string of English letters or numbers, the same syntax as [a-zA-Z0-9]
The syntax is the same as [a-zA-Z0-9]+
non-English letters or numbers String, the syntax is the same as [^a-zA-Z0-9]
The syntax is the same as [^a-zA-Z0-9]+
Space, the same syntax as [\n\t\r\f]
Same as [\n\t\r\f]+
##a|b|c
abc
/pattern/i