Home  >  Article  >  Backend Development  >  PHP POSIX regular expressions

PHP POSIX regular expressions

藏色散人
藏色散人forward
2019-12-04 10:31:312195browse

1 Basic knowledge

Regular expression is a way to describe a pattern of text. The exact (literal) matching we have used so far is also a regular expression. For example, earlier we searched for regular expression terms like "shop" and "delivery".

In PHP, matching regular expressions is more like strstr() matching, rather than equality comparison, because it is at a certain position in a string (if not specified, it may be anywhere in the string) position) matches another string. For example, the string "shop" matches the regular expression "shop". It can also match the regular expressions "h", "ho", etc.

In addition to exact matching characters, special characters can also be used to specify the meta-meaning of the expression. For example, using special characters, you can specify a pattern that must exist at the beginning or end of a string, that part of the pattern may be repeated, or that the characters in the pattern belong to a specific type. Additionally, you can match by occurrence of special characters. Next, we'll discuss these changes one by one.

2 Character sets and classes

Using character sets can immediately provide regular expressions that are more powerful than exact matching. Character sets can be used to match any character of a specific type; in fact, they are a kind of wildcard character.

First of all, you can use a character as a wildcard character to replace any character except the newline character (\n). For example, the regular expression:

.at

can match "cat", "sat", "mat", etc. Typically, this wildcard matching is used for file name matching in operating systems.

However, using regular expressions, you can be more specific about the type of characters you want to match, and you can specify a set to which the characters belong. In the previous example, the regular expression matched "cat" and "mat", but it could also match "#at". If you want to limit it to characters between a and z, you can specify it like this:

[a-z]at

Any content enclosed in square brackets ([]) is a character class - a matched character The character set it belongs to. Note that the expression in square brackets matches only one character.

We can list a set, for example:

[aeiou]

can be used to represent vowels.

can also describe a range, as before with a hyphen, or a range set:

[a-zA-Z]

This range set represents any uppercase or lowercase letter.

In addition, you can also use sets to indicate that characters do not belong to a certain set. For example:

[^a-z]

can be used to match any character that is not between a and z. When the caret (^) is enclosed in square brackets, it means no. When this symbol is used outside square brackets, it means another meaning, which we will introduce in detail later.

3 Repeat

Often, readers will want to indicate that a certain string or character class will appear more than once. You can use two special characters instead in regular expressions. The symbol "*" indicates that this pattern can be repeated 0 or more times, and the symbol " " indicates that this pattern can be repeated 1 or more times. These two symbols should be placed after the expression to be acted upon.

For example:

[[:alnum:]]+

means "at least one alphabetic character".

4 Subexpression

Usually, it is very useful to separate an expression into several subexpressions. For example, it can mean "at least in these strings One needs to be an exact match". This can be done using parentheses, in the same way as in mathematical expressions.

For example:

(very)*large

can match "large", "very large", "very very large", etc.

5 Subexpression count

You can use a numeric expression in curly braces ({}) to specify the number of times content is allowed to be repeated. You can specify an exact number of repetitions ({3} means repeating 3 times), or a range of repetitions ({2, 4} means repeating 2 to 4 times), or an open-bottom repetition range ({2, 4} means repeating 2 to 4 times). } means to repeat it at least twice).

For example:

(very){1,3}

means matching "very", "very very" and "very very very".

6 Positioning at the beginning or end of a string

[a-z] pattern will match any string containing lowercase alphabetic characters. It doesn't matter whether the string has only one character, or contains only one matching character in the entire longer string.

You can also determine whether a specific subexpression appears at the beginning, at the end, or in both positions. This is useful when you want to make sure that only the word you are looking for and no other words appear in a string.

The caret (^) is used at the beginning of the regular expression, indicating that the substring must appear at the beginning of the searched string, and the character "$" is used at the end of the regular expression, indicating the substring Must appear at the end of the string.

For example, the following matches bob at the beginning of the string:

^bob

This pattern will match strings where com appears at the end of the string:

com$

Finally, this The pattern will match a string containing only one character from a to z:

^[a-z]$

7 Branch

可以使用正则表达式中的一条竖线来表示一个选择。例如,如果要匹配com、edu或net,就可以使用如下所示的表达式:

com|edu|net

8 匹配特殊字符

如果要匹配本节前面提到过的特殊字符,例如,.、{或者$,就必须在它们前面加一个反斜杠(\)。如果要匹配一个反斜杠,则必须用两个反斜杠(\\)来表示。

在PHP中,必须将正则表达式模式包括在一个单引号字符串中。使用双引号引用的正则表达式将带来一些不必要的复杂性。PHP还使用反斜杠来转义特殊字符——例如反斜杠。

如果希望在模式中匹配一个反斜杠,必须使用两个反斜杠来表示它是一个反斜杠字符,而不是一个转义字符。

同样,由于相同的原因,如果希望在一个双引号引用的PHP字符串中使用反斜杠字符,必须使用两个反斜杠。这可 能会有些混淆,这样要求的结果将是表示一个包含了反斜杠字符的正则表达式的一个PHP字符串需要4个反斜杠。PHP解释器将这4个反斜杠解释成2个。然 后,由正则表达式解释器解析为一个。

$符号也是双引号引用的PHP字符串和正则表达式的特殊字符。要使一个$字符能够在模式中匹配,必须使用“\\\$”。因为这个字符串被引用在双引号中,PHP解释器将其解析为\$,而正则表达式解释器将其解析成一个$字符。

9 在智能表单中应用

在智能表单应用程序中,正则表达式至少有两种用途。第一种用途是在顾客的反馈中查找特定的名词。使用正则表达 式,可以做得更智能一些。使用一个字符串函数,如果希望匹配"shop"、"customer service"或"retail",就必须做3次不同的搜索。如果使用一个正则表达式,就可以同时匹配所有3个,如下所示:

shop|customer service|retail

第二个用途是验证程序中用户的电子邮件地址,这需要通过用正则表达式来对电子邮件地址的标准格式进行编码。这 个格式中包含一些数字或标点符号,接着是符号“@”,然后是包括文字或数字和字符组成的字符串,后面接着是一个“.”(点号),后面包括文字或数字以连字 符组成的字符串,可能还有更多的点号,直到字符串结束,它的编码如下所示:

^[a-zA-Z0-9_\-.]+@[a-zA-Z0-9\-]+\.[a-zA-Z0-9\-.]+$

子表达式^[a-zA-Z0-9_\-.]+表示“至少由一个字母、数字、下画线、连字符、点号或者这些字符组合为开始的字符串”。请注意,当在一个字符类的开始或末尾处使用点号时,点号将失去其特殊通配符的意义,只能成为一个点号字符。

符号“@”匹配字符“@”。

而子表达式[a-zA-Z0-9\-]+与包含文字数字字符和连字符的主机名匹配。请注意,我们去除了连字符,因为它是方括号内的特殊字符。

字符组合“\.”匹配“.”字符。我们在字符类外部使用点号,因此必须对其转义,使其能够匹配一个点号字符。

子表达式[a-zA-Z0-9\-\.]+$匹配域名的剩下部分,它包含字母、数字和连字符,如果需要还可包含更多的点号直到字符串的末尾。

不难发现,有时一个无效的电子邮件地址也会符合这个正则表达式。找到所有无效电子邮件几乎是不可能的,但是经 过分析,情形将会有所改善。可以按许多不同的方式精化这个表达式。例如,可以列出所有有效的顶级域(TLD)。当对某些对象进行限制的时候,请千万小心, 因为可能排斥1%的有效数据的校验函数比允许出现10%的无效数据的校验函数还要麻烦。

推荐:《PHP教程

The above is the detailed content of PHP POSIX regular expressions. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:aliyun.com. If there is any infringement, please contact admin@php.cn delete