Home > Article > Backend Development > PHP study notes POSIX regular expressions_PHP tutorial
1 Basic knowledge
Regular expressions are a way to describe a pattern in a piece of text. The exact literal) matching we have used so far is also a regular expression. For example, earlier we searched for regular expression terms like "shop" and "delivery".
In PHP, matching regular expressions is more like strstr() matching, rather than equality comparison, because it is at a certain position in a string. If it is not specified, it may be in the string. anywhere) matches another string. For example, the string "shop" matches the regular expression "shop". It can also match the regular expressions "h", "ho", etc.
In addition to exact matching characters, special characters can also be used to specify the meta-meaning of the expression). For example, using special characters, you can specify a pattern that must exist at the beginning or end of a string, that part of the pattern may be repeated, or that the characters in the pattern belong to a specific type. Additionally, you can match by occurrence of special characters. Next, we'll discuss these changes one by one.
2 Character sets and classes
Using character sets can immediately provide regular expressions that are more powerful than exact matching. Character sets can be used to match any character of a specific type; in fact, they are a kind of wildcard character.
First of all, you can use a character as a wildcard character to replace any character except the newline character n). For example, regular expression:
.at
Can be matched with "cat", "sat", "mat", etc. Typically, this wildcard matching is used for file name matching in operating systems.
However, using regular expressions, you can be more specific about the type of characters you want to match, and you can specify a set to which the characters belong. In the previous example, the regular expression matched "cat" and "mat", but it could also match "#at". If you want to limit it to characters between a and z, you can specify it like this:
[a-z]at
Anything enclosed in square brackets []) is a character class - a set of characters to which the matched character belongs. Note that the expression in square brackets matches only one character.
We can list a collection, for example:
[aeiou]
can be used to represent vowel consonants.
can also describe a range, as before with a hyphen, or a set of ranges:
[a-zA-Z]
This range set represents any uppercase or lowercase letters.
In addition, you can also use sets to indicate that characters do not belong to a certain set. For example:
[^a-z]
can be used to match any character that is not between a and z. When the caret ^) is enclosed in square brackets, it means no. When this symbol is used outside square brackets, it means another meaning, which we will introduce in detail later.
3 Repeat
Often, readers will want to indicate that a certain string or character class will appear more than once. You can use two special characters instead in regular expressions. The symbol "*" indicates that this pattern can be repeated 0 or more times, and the symbol " " indicates that this pattern can be repeated 1 or more times. These two symbols should be placed after the expression to be acted upon.
For example:
[[:alnum:]]
means "at least one alphabetic character".
4 sub-expressions
Often it is useful to separate an expression into several subexpressions, for example to mean "at least one of these strings needs to match exactly". This can be done using parentheses, in the same way as in mathematical expressions.
For example:
(very)*large
Can match "large", "very large", "very very large", etc.
5 subexpression count
You can use a numeric expression within curly braces {}) to specify the number of times content is allowed to be repeated. You can specify an exact number of repetitions {3} which means repeating 3 times), or a range of repetitions {2, 4} which means repeating 2 to 4 times), or an open-bottom range of repetitions {2,} which means at least to be repeated twice).
For example:
(very){1,3}
means matching "very", "very very" and "very very very".
6 Position to the beginning or end of the string
The[a-z] pattern will match any string containing lowercase alphabetic characters. It doesn't matter whether the string has only one character, or contains only one matching character in the entire longer string.
It is also possible to determine whether a specific subexpression occurs at the beginning, at the end, or in both positions. This is useful when you want to make sure that only the word you are looking for and no other words appear in a string.
Caret ^) is used at the beginning of the regular expression, indicating that the substring must appear at the beginning of the string being searched. The character "$" is used at the end of the regular expression, indicating that the substring must appear at the beginning of the string. the end of the string.
For example, the following matches bob at the beginning of the string:
^bob
This pattern will match strings where com appears at the end of the string:
com$
Finally, this pattern will match strings containing only one character between a and z:
^[a-z]$
7 Branches
You can use a vertical bar in a regular expression to represent a selection. For example, if you want to match com, edu, or net, you can use an expression like this:
com|edu|net
8 Match special characters
If you want to match the special characters mentioned earlier in this section, such as ., {, or $, you must add a backslash in front of them). If you want to match a backslash, you must use two backslashes\) to express it.
In PHP, regular expression patterns must be enclosed in a single-quoted string. Using double quotes for regular expressions introduces some unnecessary complexity. PHP also uses backslashes to escape special characters - such as backslashes.
If you wish to match a backslash in a pattern, you must use two backslashes to indicate that it is a backslash character, not an escape character.
Also, for the same reason, if you wish to use the backslash character in a double-quoted PHP string, you must use two backslashes. This may be confusing, as the result of this requirement will be that a PHP string representing a regular expression containing a backslash character requires 4 backslashes. The PHP interpreter interprets these 4 backslashes as 2. It is then parsed into one by the regular expression interpreter.
The$ symbol is also a special character for double-quoted PHP strings and regular expressions. For a $ character to match in a pattern, "\$" must be used. Because this string is quoted in double quotes, the PHP interpreter parses it as $, and the regex interpreter parses it as a $ character.
9 Application in smart form
Regular expressions have at least two uses in smart form applications. The first use is to find specific nouns in customer feedback. Using regular expressions, you can do it a little smarter. Using a string function, if you want to match "shop", "customer service" or "retail", you have to do 3 different searches. If you use a regular expression, you can match all 3 at the same time, like this:
shop|customer service|retail
The second use is to validate a user's email address in a program, which requires encoding the standard format of the email address with a regular expression. This format contains some numbers or punctuation marks, followed by the symbol "@", then a string of letters or numbers and characters, followed by a "." period), followed by letters or numbers followed by hyphens. string, possibly with more dots, until the end of the string, which is encoded as follows:
^[a-zA-Z0-9_-.] @[a-zA-Z0-9-] .[a-zA-Z0-9-.] $
The subexpression ^[a-zA-Z0-9_-.] means "a string starting with at least one letter, number, underscore, hyphen, period, or a combination of these characters." Please note that when a dot is used at the beginning or end of a character class, the dot loses its special wildcard meaning and becomes only a dot character.
The symbol "@" matches the character "@".
And the subexpression [a-zA-Z0-9-] matches hostnames that contain alphanumeric characters and hyphens. Note that we removed the hyphen since it is a special character inside square brackets.
The character combination "." matches the "." character. We are using the dot outside a character class, so it must be escaped so that it matches a dot character.
The subexpression [a-zA-Z0-9-.] $ matches the rest of the domain name, which contains letters, numbers, hyphens, and more periods if necessary until the end of the string.
It is not difficult to find that sometimes an invalid email address will also match this regular expression. It is almost impossible to find all invalid emails, but with analysis, the situation will improve. This expression can be refined in many different ways. For example, all valid top-level domain TLDs can be listed). Please be careful when restricting certain objects, because a validation function that may exclude 1% of valid data is more troublesome than a validation function that allows 10% of invalid data.
"Read one book every week, then you have read 50 books in a year." Come on. . . Don't give yourself too many reasons to give up.