Home >Backend Development >PHP Tutorial >Regular expressions compatible with PHP and Perl_PHP tutorial
1 Preface
PHP is widely used in background CGI development of the Web. Usually, a certain result is obtained after user data. However, if the data entered by the user is incorrect, problems will occur. For example, someone's birthday is "February 30" day"! So how should we check whether the summer vacation is correct? Regular expression support has been added to PHP, allowing us to perform data matching very conveniently.
2 What is a regular expression:
Simply put, regular expressions are a powerful tool for pattern matching and replacement. Traces of regular expressions can be found in almost all software tools based on Unix/Linux systems, such as Perl or PHP scripting languages. In addition, the client-side scripting language &106avascript also provides support for regular expressions. Now regular expressions have become a common concept and tool and are widely used by various technical personnel.
There is something like this on a Linux website: "If you ask a Linux enthusiast what he likes most, he will probably answer regular expressions; if you ask him what he is most afraid of, besides tedious installation and configuration, he will definitely say regular expressions. "
As mentioned above, regular expressions look very complicated and scary. Most PHP beginners will skip this and continue learning below. However, regular expressions in PHP can use pattern matching to find characters that meet the conditions. It would be a pity not to learn the powerful functions such as string, judging whether a string meets the conditions or using a specified string to replace the qualified string.
3 Basic syntax of regular expressions:
A regular expression is divided into three parts: delimiter, expression and modifier.
The delimiter can be any character except special characters (such as "/!", etc.), and the commonly used delimiter is "/". The expression consists of some special characters (see special characters below) and non-special strings. For example, "[a-z0-9_-]+@[a-z0-9_-.]+" can match a simple electron Mail string. Modifiers are used to turn on or off a certain function/mode. Here is an example of a complete regular expression:
/hello.+?hello/is
The above regular expression "/" is the delimiter, the one between the two "/" is the expression, and the string "is" after the second "/" is the modifier.
If there is a delimiter in the expression, you need to use the escape symbol "", such as "/hello.+?/hello/is". In addition to being used as delimiters, escape symbols can also execute special characters. All special characters composed of letters need to be escaped with "", such as "d" representing all numbers.
4 Special characters for regular expressions:
Special characters in regular expressions are divided into metacharacters, positioning characters, etc.
Metacharacters are a type of characters with special meaning in regular expressions that are used to describe the way in which their leading characters (i.e., the characters before the metacharacters) appear in the matched object. Metacharacters themselves are single characters, but different or identical metacharacters can be combined to form large metacharacters.
Metacharacters:
Braces: Braces are used to accurately specify the number of occurrences of matching metacharacters. For example, "/pre{1,5}/" means that the matching object can be "pre", "pree", "preeeee", etc. after "pr" A string of 1 to 5 "e" appears. Or "/pre{,5}/" means pre appears between 0 and 5 times.
Plus sign: The "+" character is used to match the character before the metacharacter appearing one or more times. For example, "/ac+/" means that the matched object can be "act", "account", "acccc" and other strings with one or more "c" appearing after "a". "+" is equivalent to "{1,}".
Asterisk: The "*" character is used to match zero or more occurrences of the character before the metacharacter. For example, "/ac*/" means that the matched object can be "app", "acp", "accp" and other strings with zero or more "c" appearing after "a". "*" is equivalent to "{0,}".
Question mark: The "?" character is used to match zero or one occurrence of the character before the metacharacter. For example, "/ac?/" means that the matching object can be "a", "acp", or "acwp". In this way, zero or one "c" string appears after "a". "?" also plays a very important role in regular expressions, that is, "greedy mode".
There are two other very important special characters: "[ ]". They can match characters that appear in "[]". For example, "/[az]/" can match a single character "a" or "z"; if the above expression is changed to "/[a-z]/" , you can match any single lowercase letter, such as "a", "b", etc.
If "^" appears in "[]", it means that this expression does not match the characters appearing in "[]". For example, "/[^a-z]/" does not match any lowercase letters! And the regular expression gives several default values of "[]":
[:alpha:]: matches any letter
[:alnum:]: matches any letters and numbers
[:digit:]: matches any number
[:space:]: matches the space character
[:upper:]: matches any uppercase letter
[:lower:]: matches any lowercase letter
[:punct:]: matches any punctuation mark
[:xdigit:]: matches any hexadecimal digit
In addition, the following special characters have the following meanings after being escaped with the escape symbol "":
s: matches a single space character
S: Used to match all characters except a single space character.
d: Used to match numbers from 0 to 9, equivalent to "/[0-9]/".
w: Used to match letters, numbers or underscore characters, equivalent to "/[a-zA-Z0-9_]/".
W: Used to match all characters that do not match w, equivalent to "/[^a-zA-Z0-9_]/".
D: used to match any non-decimal numeric characters.
.: Used to match all characters except newline characters. If modified by the modifier "s", "." can represent any character.
The above special characters can be used to easily express some cumbersome pattern matching. For example, "/d0000/" can use the above regular expression to match integer strings ranging from more than 10,000 to less than 100,000.
Anchor character:
Positioning characters are another very important type of characters in regular expressions. Their main function is to describe the position of characters in the matching object.
^: Indicates that the matching pattern appears at the beginning of the matching object (different from "[]")
$: Indicates that the matching pattern appears at the end of the matching object
Space: Indicates that the matched pattern appears at one of the two boundaries
at the beginning and end"/^he/": can match strings starting with the "he" character, such as hello, height, etc.;
"/he$/": can match strings ending with the "he" character, that is, she, etc.;
"/ he/": starts with a space, has the same effect as ^, matches strings starting with he;
"/he /": ends with a space, has the same effect as $, matches strings ending with he;
"/^he$/": Indicates that it only matches the string "he".
Brackets:
In addition to regular expression matching, you can also use brackets "()" to record the required information, store it, and read it for subsequent expressions. For example:
/^([a-zA-Z0-9_-]+)@([a-zA-Z0-9_-]+)(.[a-zA-Z0-9_-])$/
It is to record the user name of the email address and the server address of the email address (in the form of username@server.com or the like). If you want to read the recorded string later, you just need to use "escape character + recorded order" to read. For example, "1" is equivalent to the first "[a-zA-Z0-9_-]+", "2" is equivalent to the second one ([a-zA-Z0-9_-]+), and "3" is The third one (.[a-zA-Z0-9_-]). But in PHP, "" is a special character that needs to be escaped, so "" should be written as "\1" in PHP expressions.
Other special symbols:
"|": The or symbol "|" is the same as the or in PHP, but it is just one "|" instead of two "||" in PHP! This means that it can be a certain character or another string. For example, "/abcd|dcba/" may match "abcd" or "dcba".
5 Greedy Mode:
As mentioned before in metacharacters, "?" also plays an important role, that is, "greedy mode". What is "greedy mode"?
For example, if we want to match a string that starts with the letter "a" and ends with the letter "b", but the string that needs to be matched contains many "b"s after "a", such as "a bbbbbbbbbbbbbbbbb", then the regular expression will match The first "b" or the last "b"? If you use greedy mode, the last "b" will be matched, otherwise only the first "b" will be matched.
The expression using greedy mode is as follows:
/a.+?b/
/a.+b/U
The following does not use greedy mode:
/a.+b/
A modifier U is used above, see the section below for details.
6 Modifier:
Modifiers in regular expressions can change many characteristics of the regular expression to make the regular expression more suitable for your needs (note: modifiers are case-sensitive, which means "e" is not equal to "E"). The modifiers in regular expressions are as follows:
i: If "i" is added to the modifier, the regular expression will cancel the case sensitivity, that is, "a" and "A" are the same.
m: The default regular start "^" and end "$" are only for regular strings. If "m" is added to the modifier, then the start and end will refer to each line of the string: the beginning of each line is "^ ", ending with "$".
s: If "s" is added to the modifier, the default "." means that any character except the newline character will become any character, including the newline character!
x: If this modifier is added, whitespace characters in the expression will be ignored unless it has been escaped.
e: This modifier is only useful for replacement, which means it is used as PHP code in replacement.
A: If this modifier is used, the expression must be the beginning of the matched string. For example, "/a/A" matches "abcd".
E: Contrary to "m", if this modifier is used, then "$" will match the absolute end of the string, not before the newline character. This mode is turned on by default.
U: It has the same function as the question mark and is used to set "greedy mode".
7 PCRE related regular expression functions:
PHP's Perl is compatible with multiple functions provided by regular expressions, divided into pattern matching, replacement, number of matches, etc.:
1. preg_match:
Function format: int preg_match(string pattern, string subject, array [matches]);
This function will use pattern expressions in strings to match. If [regs] is given, the string will be recorded in [regs][0]. [regs][1] represents the use of brackets "()" to record it. The first string, [regs][2] represents the second string recorded, and so on. preg will return "true" if a matching pattern is found in string, otherwise it will return "false".
2. preg_replace:
Function format: mixed preg_replace(mixed pattern, mixed replacement, mixed subject);
This function will replace all strings in string that match the expression pattern with expression replacement. If the replacement needs to contain some characters of the pattern, you can use "()" to record it. In the replacement, you just need to use "1" to read.
3. preg_split:
Function format: array preg_split(string pattern, string subject, int [limit]);
This function is the same as the function split. The only difference is that split can use simple regular expressions to split matching strings, while preg_split uses fully Perl-compatible regular expressions. The third parameter limit represents how many qualified values are allowed to be returned.
4. preg_grep:
Function format: array preg_grep(string pattern, array input);
This function is basically the same as preg_match, but preg_grep can match all elements in the given array input and return a new array.
Here is an example. For example, we want to check whether the format of the email address is correct:
function emailIsRight($email) {
if (preg_match("^[_.0-9a-z-]+@([0-9a-z][0-9a-z-]+.)+[a-z]{2,3}$",$ email)) {
return 1;
}
return 0;
}
if(emailIsRight('y10k@963.net')) echo 'Correct
';
if(!emailIsRight('y10k@fffff')) echo 'Incorrect
';
?>
The above program will output "Correct
Incorrect".
8. The difference between Perl-compatible regular expressions and Perl/Ereg regular expressions in PHP:
Although it is called "Perl compatible regular expression", compared with Perl's regular expression, PHP's still has some differences. For example, the modifier "G" in Perl represents all matches, but there is no support for this modifier in PHP. support.
There is also the difference from the ereg series of functions. ereg is also a regular expression function provided in PHP, but it is much weaker than preg.
1. Separators and modifiers are not required and cannot be used in ereg, so the function of ereg is much weaker than preg.
2. Regarding ".": The dot in a regular expression is generally all characters except newline characters, but the "." in ereg is any character, including newline characters! If you want "." to include a newline character in preg, you can add "s" to the modifier.
3. ereg uses greedy mode by default and cannot be modified. This brings trouble to many replacements and matchings.
4. Speed: This may be a question that many people are concerned about. Is the powerful function of preg in exchange for speed? Don’t worry, preg is much faster than ereg. I made a program test:
time test:
PHP code:
echo "Preg_replace used time:";
$start = time();
for($i=1;$i<=100000;$i++) {
$str = "ssssssssssssssssssssssssssss";
preg_replace("/s/","",$str);
}
$ended = time()-$start;
echo $ended;
echo "
ereg_replace used time:";
$start = time();
for($i=1;$i<=100000;$i++) {
$str = "ssssssssssssssssssssssssssss";
ereg_replace("s","",$str);
}
$ended = time()-$start;
echo $ended;
echo "
str_replace used time:";
$start = time();
for($i=1;$i<=100000;$i++) {
$str = "ssssssssssssssssssssssssssss";
str_replace("s","",$str);
}
$ended = time()-$start;
echo $ended;
?>
Result:
Preg_replace used time:5
ereg_replace used time:15
str_replace used time:2
str_replace is very fast because it does not require matching, and preg_replace is much faster than ereg_replace.
9. Regarding PHP3.0’s support for preg:
Preg support was added by default in PHP 4.0, but not in 3.0. If you want to use the preg function in 3.0, you must load the php3_pcre.dll file. Just add "extension = php3_pcre.dll" to the extension section of php.ini and then restart PHP!
In fact, regular expressions are often used in the implementation of UbbCode. Many PHP forums use this method (such as zForum zphp.com or VB vbullent.com), but the specific code is relatively long.