Home  >  Article  >  Backend Development  >  Detailed study of php regular expressions_PHP tutorial

Detailed study of php regular expressions_PHP tutorial

WBOY
WBOYOriginal
2016-07-14 10:07:10782browse

1. Introduction

Simply put, regular expressions are a powerful tool for pattern matching and replacement. We can find regular expressions in almost all tools based on UNIX systems, such as the vi editor, Perl or PHP scripting language, and awk or sed shell programs. In addition, client-side scripting languages ​​like JavaScript also provide support for regular expressions. It can be seen that regular expressions have exceeded the limitations of a certain language or a certain system and have become a widely accepted concept and function.
Regular expressions allow users to construct a matching pattern by using a series of special characters, and then compare the matching pattern with target objects such as data files, program input, and form input on WEB pages. Depending on whether the comparison object contains the matching pattern, execute the corresponding program.
For example, one of the most common applications of regular expressions is to verify whether the format of the email address entered by the user online is correct. If the regular expression is used to verify that the format of the user's email address is correct, the form information filled in by the user will be processed normally; otherwise, if the email address entered by the user does not match the regular expression pattern, a prompt message will pop up asking the user to retry. Enter the correct email address. It can be seen that regular expressions play an important role in the logical judgment of WEB applications.

2. Basic grammar

After having a preliminary understanding of the functions and effects of regular expressions, let’s take a closer look at the syntax format of regular expressions.
The form of regular expression is generally as follows:
/love/
The part between the "/" delimiters is the pattern that will be matched in the target object. Users only need to put the content of the pattern that they want to find matching objects between the "/" delimiters. In order to enable users to customize pattern content more flexibly, regular expressions provide special "metacharacters". The so-called metacharacters refer to those special characters with special meaning in regular expressions, which can be used to specify the appearance pattern of their leading characters (that is, the characters in front of the metacharacters) in the target object.
The more commonly used metacharacters include: "+", "*", and "?". Among them, the "+" metacharacter stipulates that its leading character must appear one or more times in the target object, the "*" metacharacter stipulates that its leading character must appear zero times or multiple times in a row in the target object, and the "?" A character specifies that its preceding object must appear zero or once in the target object.
Next, let us take a look at the specific applications of regular expression metacharacters.
/fo+/
Because the above regular expression contains the "+" metacharacter, it means that it can match strings such as "fool", "fo", or "football" in the target object in which one or more letters o appear consecutively after the letter f. .
/eg*/
Because the above regular expression contains the "*" metacharacter, it means that it can be compared with strings such as "easy", "ego", or "egg" in the target object, in which zero or more letters g appear continuously after the letter e. match.
/Wil?/
Because the above regular expression contains the "?" metacharacter, it means that it can match "Win", or "Wilson" in the target object, and other strings in which zero or one letter l appears consecutively after the letter i.
In addition to metacharacters, users can specify exactly how often a pattern appears in a matched object. For example,
/jim{2,6}/
The above regular expression stipulates that the character m can appear 2-6 times continuously in the matching object. Therefore, the above regular expression can match strings such as jimmy or jimmmmmy.
After having a preliminary understanding of how to use regular expressions, let's look at how to use several other important metacharacters.
s: used to match a single space character, including tab key and newline character;
S: used to match all characters except a single space character;
d: used to match numbers from 0 to 9;
w: used to match letters, numbers or underscore characters;
W: used to match all characters that do not match w;
. : Used to match all characters except newline characters.
(Explanation: We can think of s and S and w and W as inverse operations of each other)
Below, we will look at how to use the above metacharacters in regular expressions through examples.
/s+/
The above regular expression can be used to match one or more space characters in the target object.
/d000/
If we have a complex financial statement in hand, we can easily find all the amounts totaling thousands of dollars through the above regular expression.

In addition to the metacharacters we introduced above, regular expressions also have another unique special character, namely the locator. Locators are used to specify where the matching pattern appears in the target object.
The more commonly used locators include: "^", "$", "b" and "B". Among them, the "^" locator specifies that the matching pattern must appear at the beginning of the target string, the "$" locator specifies that the matching pattern must appear at the end of the target object, and the b locator specifies that the matching pattern must appear at the beginning or end of the target string. One of the two boundaries at the end, while the "B" locator stipulates that the matching object must be located within the two boundaries of the beginning and end of the target string, that is, the matching object cannot be used as the beginning of the target string or as a target character. The end of the string. Similarly, we can also regard "^" and "$" and "b" and "B" as two sets of locators that are inverse operations of each other. For example:
/^hell/
Because the above regular expression contains the "^" locator, it can match strings starting with "hell", "hello" or "hellhound" in the target object.
/ar$/
Because the above regular expression contains the "$" locator, it can match strings ending with "car", "bar" or "ar" in the target object.

/bbom/
Because the above regular expression pattern starts with the "b" locator, it can match strings that start with "bomb", or "bom" in the target object.
/manb/
Because the above regular expression pattern ends with the "b" locator, it will match any string in the target object that ends with "human", "woman", or "man".
In order to facilitate users to set matching patterns more flexibly, regular expressions allow users to specify a certain range in the matching pattern without being limited to specific characters. For example:
/[A-Z]/
The above regular expression will match any uppercase letter from A to Z.
/[a-z]/
The above regular expression will match any lowercase letter in the range from a to z.
/[0-9]/
The above regular expression will match any number from 0 to 9.
/([a-z][A-Z][0-9])+/
The above regular expression will match any string consisting of letters and numbers, such as "aB0", etc. One thing that users need to pay attention to here is that you can use "()" in regular expressions to combine strings together. The content contained in the "()" symbol must also appear in the target object. Therefore, the above regular expression will not match a string such as "abc" because the last character in "abc" is a letter and not a number.
If we want to implement a regular expression similar to the "OR" operation in programming logic and select any one of multiple different patterns for matching, we can use the pipe character "|". For example:
/to|too|2/
The above regular expression will match "to", "too", or "2" in the target object.
There is also a more commonly used operator in regular expressions, the negation operator "[^]". Different from the locator "^" we introduced earlier, the negation character "[^]" specifies that the string specified in the pattern cannot exist in the target object. For example:
/[^A-C]/
The above string will match any character in the target object except A, B, and C. Generally speaking, when "^" appears inside "[]", it is regarded as a negation operator; when "^" is located outside "[]", or there is no "[]", it should be regarded as a negative operator. locator.
Finally, when users need to add metacharacters to the regular expression pattern and find their matching objects, they can use the escape character "". For example:
/Th*/
The above regular expression will match "Th*" instead of "The" etc. in the target object.

3. Usage examples

①You can use the ereg() function in PHP to perform pattern matching operations. The usage format of ereg() function is as follows:

The following is the quoted content:
ereg(pattern, string)
Among them, pattern represents the pattern of the regular expression, and string is the target object for performing the search and replace operation. The same is to verify the email address. The program code written in PHP is as follows:

< ?php
if (ereg(“^([a-zA-Z0-9_-])+@([a-zA-Z0-9_-])+(.[a-zA-Z0-9_-])+”,$ email)){
echo “Your email address is correct!”;}
​ else{
echo “Please try again!”;
}
?>


②JavaScript 1.2 comes with a powerful RegExp() object, which can be used to perform regular expression matching operations. The test() method can check whether the target object contains a matching pattern and return true or false accordingly.
We can use JavaScript to write the following script to verify the validity of the email address entered by the user.

The following is the quoted content:



  


​​
  
  



Many people must have a headache with regular expressions. Today, I use my knowledge and some articles on the Internet, hoping to use an expression that ordinary people can understand. Come and share your learning experience with everyone.
At the beginning, we still have to talk about ^ and $. They are used to match the beginning and end of a string respectively. The following are examples:

"^The": There must be a string of "The" at the beginning;
"of despair$": There must be a string with "of despair" at the end;

So,
"^abc$": It requires a string starting with abc and ending with abc. In fact, only abc matches;
"notice": Matches strings containing notice;

You can see that if you don't use the two characters we mentioned (the last example), it means that the pattern (regular expression) can appear anywhere in the string being checked, you are not locking it to both sides.

Next, let’s talk about ‘*’ ‘+’ and ‘?’
They are used to represent the number or order in which a character can appear. They represent respectively:
"zero or more" is equivalent to {0,}
"one or more" is equivalent to {1,}
"zero or one." is equivalent to {0,1}

Here are some examples:

"ab*": synonymous with ab{0,}, matches a string that starts with a and can be followed by 0 or N b's ("a", "ab", "abbb", etc.) ;
"ab+": synonymous with ab{1,}, the same as the above, but there must be at least one b ("ab" "abbb", etc.);
"ab?": synonymous with ab{0,1}, there can be no or only one b;
"a?b+$": Matches a string ending with one or 0 a plus one or more b.

Key points: ‘*’ ‘+’ and ‘?’ only care about the character before it.

You can also limit the number of characters within curly brackets, for example:

“ab{2}”: It is required that a must be followed by two b (not one less) (“abb”);
"ab{2,}": It is required that there must be two or more b after a (such as "abb" "abbbb", etc.);
"ab{3,5}": It is required that there can be 2-5 b ("abbb", "abbbb", or "abbbbb") after a.

Now we put certain characters into parentheses, for example:

"a(bc)*": matches a followed by 0 or one "bc";
"a(bc){1,5}": one to 5 "bc";

There is also a character ‘|’, which is equivalent to the OR operation:

"hi|hello": Matches strings containing "hi" or "hello";
"(b|cd)ef": Matches strings containing "bef" or "cdef";
"(a|b)*c": Matches a string containing multiple (including 0) a or b, followed by a c;

A dot (’.’) can represent all single characters, excluding “n”

What if you want to match all single characters including "n"?

Use the pattern '[n.]'.

“a.[0-9]”: an a plus a character plus a number from 0 to 9;
"^.{3}$": ends with three arbitrary characters.

Content enclosed in square brackets only matches a single character

"[ab]": matches a single a or b (same as "a│b");
"[a-d]": matches a single character from 'a' to 'd' (same effect as "a│b│c│d" and "[abcd]");

Generally we use [a-zA-Z] to specify characters as uppercase and lowercase English:

“^[a-zA-Z]”: Matches strings starting with uppercase and lowercase letters;
"[0-9]%": matches strings containing x%;
",[a-zA-Z0-9]$": matches a string ending with a comma followed by a number or letter;

You can also list the characters you don’t want in square brackets. You just need to use '^' as the beginning of the brackets. "%[^a-zA-Z]%" matches two percent signs. There is a non-alphabetic string in it.

Points: When ^ is used at the beginning of square brackets, it means to exclude the characters in the brackets.

In order for PHP to interpret it, you must add "" before and after these characters and escape some characters.

Don’t forget that characters inside square brackets are exceptions to this rule - inside square brackets, all special characters, including ("), will lose their special properties "[*+?{}.]" Matches strings containing these characters:

Also, as the regx manual tells us: "If the list contains ']', it is best to use it as the first character in the list (maybe followed by '^'). If it contains '-', It is better to put it at the front or at the end, or the '-' in the middle of the second end point of a range [a-d-0-9] will be effective.

After reading the above example, you should understand {n,m}. It should be noted that neither n nor m can be negative integers, and n is always less than m. In this way, it can be matched at least n times and at most m times. For example, "p{1,5}" will match the first five p

in "pvpppppp"

Let’s talk about the words starting with

b The book says that it is used to match a word boundary, that is...for example, 'veb', it can match ve in love but not ve in very

B is exactly the opposite of b above. I won’t give any examples

…..I suddenly remembered….You can go to http://www.phpv.net/article.php/251 to see other syntax starting with

Okay, let’s do an application: how to build a pattern to match an input of a currency amount.

Construct a matching pattern to check whether the input information is a number representing money. We believe that there are four ways to express the amount of money: "10000.00" and "10,000.00", or without a decimal part, "10000" and "10,000". Now let's start building this matching pattern:

^[1-9][0-9]*$

This means that all variables must start with a non-zero number. But this also means that a single "0" will not pass the test. Here’s how to solve it:

^(0|[1-9][0-9]*)$

"Only 0 and numbers not starting with 0 match it", we can also allow a negative sign before the number:

^(0|-?[1-9][0-9]*)$

This is: 0 or a number starting with 0 and possibly preceded by a negative sign. Okay, now let's be less strict and allow it to start with 0. Let's drop the negative sign now, since we don't need it when representing coins. We now specify the pattern to match the decimal part:

^[0-9]+(.[0-9]+)?$

This implies that the matching string must start with at least an Arabic digit. But note that in the above pattern "10." is not matched. Only "10" and "10.2" are acceptable. Do you know why?

^[0-9]+(.[0-9]{2})?$

We specified above that there must be two decimal places after the decimal point. If you think this is too harsh, you can change it to:

^[0-9]+(.[0-9]{1,2})?$

This will allow one or two characters after the decimal point. Now that we add commas (every third digit) for readability, we can represent it like this:

^[0-9]{1,3}(,[0-9]{3})*(.[0-9]{1,2})?$

Don't forget that '+' can be replaced by '*' if you want to allow blank strings to be entered, and don't forget that the backslash ''' may cause errors in PHP strings (a very common error):

Now that we can confirm the string, we now remove all commas str_replace(",", "", $money) and then treat the type as double and we can do mathematical calculations with it.

One more:

Construct a regular expression for checking email

There are three parts in a complete email address:

1. Username (everything to the left of ‘@’)

2.’@’
3. Server name (the remaining part)

Username can contain uppercase and lowercase letters, Arabic numerals, periods (’.’) minus signs (’-’) and underscores ‘_’). Server names also comply with this rule, except of course for underscores.

Now, usernames cannot start and end with periods, and the same applies to servers. Also you can’t have two consecutive periods with at least one character between them, so now let’s look at how to write a matching pattern for a username:

^[_a-zA-Z0-9-]+$

Periods are not allowed to exist yet. Let’s add it:

^[_a-zA-Z0-9-]+(.[_a-zA-Z0-9-]+)*$

The above means: starting with at least one standard character (except .), followed by 0 or more strings starting with a dot.

To simplify it a bit, we can use eregi() instead of ereg(). eregi() is not case-sensitive. We don’t need to specify two ranges "a-z" and "A-Z", we only need to specify one:

^[_a-z0-9-]+(.[_a-z0-9-]+)*$

The server name after

is the same, but the underscore is removed:

^[a-z0-9-]+(.[a-z0-9-]+)*$

Okay. Now you just need to use "@" to connect the two parts:

^[_a-z0-9-]+(.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-]+)*$

This is the complete email authentication matching mode, you only need to call:

eregi(”^[_a-z0-9-]+(.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-]+ )*$”,$eamil)

You can get whether it is email

Other uses of regular expressions

Extract string

ereg() and eregi() have a feature that allows users to extract part of a string through regular expressions (you can read the manual for specific usage). For example, we want to extract the file name from path/URL, the following code is what you need:

ereg(”([^\/]*)$”, $pathOrUrl, $regs);
echo $regs[1];

Advanced Substitution

ereg_replace() and eregi_replace() are also very useful, if we want to replace all separated negative signs with commas:

ereg_replace("[ nrt]+", ",", trim($str));

Finally, I put another regular expression for checking EMAIL for you who read the article to analyze:

“^[-!#$%&'*+\./0-9=?A-Z^_`a-z{|}~]+'.'@'.'[-!#$%&'* +\/0-9=?A-Z^_`a-z{|}~]+.'.'[-!#$%&'*+\./0-9=?A-Z^_`a-z{|}~ ]+$”

If it can be easily understood, then the purpose of this article will be achieved.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/477895.htmlTechArticle1. Introduction Simply put, regular expressions are a powerful method that can be used for pattern matching and replacement. tools. We can find regular expressions in almost all tools based on UNIX systems...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn