Home >Backend Development >PHP Tutorial >Guidance on the correct use of regular expressions in PHP
Regular expressions provide a powerful way to process text. Using regular expressions, you can perform complex validation of user input, parse user input and file contents, and reformat strings. PHP provides users with an easy way to use POSIX and PCRE regular expressions. This tutorial discusses the differences between POSIX and PCRE and explains how to use regular expressions with PHP V5.
Before you begin
Learn what you can learn from this tutorial and how to best use it this tutorial.
About this tutorial
Regular expressions provide a powerful way to process text. Using regular expressions, you can perform complex validation of user input, parse user input and text content, and reformat strings.
Objective
This tutorial will focus on simple methods using POSIX and PCRE regular expressions. Make you proficient in PHP's regular expressions. We'll explore the differences between POSIX and PCRE, and also cover how to use regular expressions with PHP V5. By following this tutorial, you'll learn how, when, and why to use regular expressions.
System Requirements
You can install PHP on any Microsoft® Windows® or Complete this tutorial on a UNIX® system, including Mac OS X and Linux®. Since what we introduce are all PHP built-in plug-ins, you only need to install PHP in your system, and there is no need to install other software.
Start
What is a regular expression?
#A few years ago I did some interesting experiments with input fields in web forms. The user will enter a phone number into this form. This phone number will then be printed on the user's ad as they typed it. As required, U.S. phone numbers can be entered in several ways: either (555) 555-5555 or 555-555-5555, but 555-5555 is not accepted.
You may be wondering, why don't we throw away all non-numeric characters and only ensure that the total number of remaining characters is 10? This approach does work, but it doesn't prevent the user from typing something like !555?333-3333.
From a web developer's perspective, this situation presents an interesting challenge. I could write routines to check for a variety of different formats, but I'd like to find a solution that allows some flexibility if the user then accepts a format like 555.555.5555.
This is exactly where regular expressions (regex for short) are applicable. I've cut and pasted them into the app before and never found any incomprehensible syntax issues. Regex looks a lot like mathematical expressions. When you see an expression of the form 2x2=4, you usually think "2 times 2 equals 4." Regular expressions are very similar. After reading this article, when you see a regular expression like ^b$, you will tell yourself: "The beginning of a line is b, and then the end of the line." Not only that, you will realize how easy it is to use regular expressions in PHP.
When to use regex
You should use regex to complete search and replace operations when there are rules to follow, but you do not necessarily have to find or replace the exact characters. For example, in the phone number example mentioned above, the user defines rules that indicate the format of the entered phone number, but not the digits contained in the phone number. This also applies to scenarios with large amounts of user input. U.S. state abbreviations are limited to two uppercase letters from A to Z. Regular expressions can also be used here, allowing you to simply limit text in a form or user input to letters of the alphabet, regardless of case or length issues.
When not to use regex
Regular expressions are powerful, but they also have some flaws. One of them requires skills in reading and writing expressions. If you decide to include regular expressions in your application, they should be fully commented. This way, if someone else needs to change the expression later, they can do so without disrupting functionality. Additionally, if you are not familiar with using regular expressions, you may find them difficult to debug.
To avoid these difficulties, don't use regular expressions when simpler built-in functions solve the problem well enough.
POSIX and PCRE
PHP supports two regular expression implementations: Portable Operating System Implementation (POSIX) and Perl-Compatible Regular Expression (PCRE) . The two implementations offer different features, but they are equally simple to use in PHP. The regex style you use depends on your past experience and usage habits with regex. There is some evidence that PCRE expressions are slightly faster than POSIX expressions, but in the vast majority of applications this difference is not that significant.
In the examples in this article, the syntax of each regex method is included in the comments. In function syntax, regex is the regex
parameter and the string being searched is string
. The parameters in parentheses are optional. Since this tutorial mainly introduces the basic content, it will not give an introduction to all optional parameters.
Regular Expression Syntax
Although POSIX and PCRE implementations differ in their support for certain features and character classes, they The syntax is the same. Each regular expression is composed of one or more characters, special characters (sometimes also called metacharacters), character classes, and character groups.
POSIX and PCRE use the same wildcards - wildcards in regex to mean "anything here". The wildcard character is an English period or period (.
). To find English periods or dots, use the escape characters /
: /.
. The same goes for other special characters discussed below, such as line anchors and qualifiers. If a character has a special meaning in a regular expression, it must be escaped to express its original literal meaning.
Line anchors are special metacharacters that match the beginning and end of a line but do not capture any text (see Table 1). For example, if a line begins with the letters a
, then the line anchor in the expression ^a
does not capture the letters a
but instead matches the beginning of the line.
Table 1. Row anchors
Description | |
---|---|
^
| Matches the beginning of a line|
$ | Matches the end of a line
The qualifier applies to the expression immediately preceding it (see Table 2). Using qualifiers, you can specify the number of times an expression is found in a search. For example, the expression a+ will find the letters
a one or more times.
Table 2. Qualifiers
Description | |
---|---|
?
| The expression before the qualifier can be searched 0 or 1 times|
+
| The expression before the qualifier can be found one or more times|
*
| The expression before the qualifier can be searched any number of times (including 0 times) |
{n}
| Qualification The expression before the qualifier can only be searched n times |
{n,m}
| The expression before the qualifier can be searched Between n and m times
Table 3. Grouping and capturing
Description | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
()
| groups characters and is able to capture the text
Character | Description |
---|---|
[:alpha:] |
Matches characters containing letters and numbers |
[:digit:] |
matches any digit |
[:space:] |
matches any blank |
POSIX Match
There are two search strings using POSIX regular expressions Functions, namely ereg()
and eregi()
.
ereg()
ereg()
method searches a string for a specific regular expression. If no match is found, 0 is returned, so you can give a test like this:
Listing 1. ereg() method
<?php $phonenbr="555-555-5555"; // Syntax is ereg( regex, string [, out_captures_array]) if (ereg("[-[:digit:]]{12}", $phonenbr)) { print("Found match!/n"); } else { print("No match found!/n"); } ?> |
Regular expression[-[:digit:]]{12}
Find 12 characters that are numbers or hyphens. This is a bit crude as far as handling phone numbers go, you could also rewrite it like this: ^[0-9]{3}-[0-9]{3}-[0-9]{ 4}$
. (In regex, [0-9]
and [:digit:]
are effectively the same, you might prefer to use [0-9]
because it is shorter. ) This alternative expression is obviously more precise. It looks for the beginning of a line (^
), followed by a set of 3 numbers ([0-9]{3}
), a hyphen (-
), another set of 3 digits, another hyphen, a set of 4 digits, and then the end of the line ($
). When you write expressions by hand, this gives you an idea of the complexity of the problem the regular expression is trying to handle, which can help you predict the types of data you'll be searching for or replacing with the expression.
eregi()
eregi()
The method is similar to ereg()
, the difference is It is not case sensitive. It will return an integer containing the length of the match found, but you will most likely use it in a conditional statement like this:
Listing 2. eregi () Method
Found match! because
hello was found in a case-ignoring search. If you are using ereg, the search will fail.
POSIX Replacement
##ereg_replace() and eregi_replace()
These two methods are used Used for replacement in text, it has the characteristics of POSIX regular expressions.
ereg_replace()You can use the
ereg_replace() method to perform case-sensitive replacement in POSIX regular expression syntax. The following example describes how to replace an email address within a string with a hyperlink:
Listing 3. ereg_replace() method
compared to other ordinary replacement functions such as str_replace()
. When using regular expressions, you define rules for searching instead of literal characters.
eregi_replace()The
eregi_replace() function is the same as ereg_replace()## except that case is ignored. # are exactly the same:
Listing 4. eregi_replace() function
<?php $origstr = "1 BANANA, 2 banana, 3 Banana"; // Syntax is: eregi_replace( regex, replacestr, string ) $newstr = eregi_replace("banana", "pear", $origstr); print("New string is: '$newstr'/n"); ?>
本例将 banana 替换为 pear ,替换操作忽略了大小写。PCRE 字符类 由于 PCRE 语法支持更短的字符类和更多的特性,因此它比 POSIX 语法更为强大。表 5 列出了 PCRE 中支持而在 POSIX 表达式中没有的部分字符类。
PCRE 匹配 PHP 中的 PCRE 匹配函数与 POSIX 匹配函数类似,但如果您习惯使用 POSIX 表达式,那么 PCRE 匹配函数的一项特性可能会使您感到棘手:PCRE 函数要求表达式以分隔符开始和结束。在绝大多数示例中,分隔符都是一个 在 PCRE 中的最后一个分隔符后,您可添加一个修饰符来更改正则表达式的行为。举例来说, preg_grep()
在本例中,正则表达式 preg_match()
本例使用了正则表达式 preg_match_all()
正则表达式 PCRE 替换 在 PHP 中进行 PCRE 替换与 POSIX 替换类似,不同之处在于使用的是 preg_replace()
This example quickly demonstrates the method of capturing part of the text and using backreference, such as In the example, you can use If the string can be in the form Conclusion PHP provides two syntaxes for regular expressions: POSIX and PCRE. This tutorial provides a high-level overview of the main functions in PHP that support POSIX and PCRE regular expressions. Using regular expressions, you can define rules to perform more powerful search and replace operations - going well beyond text search and replace.
Reference materials ##Learning
Get products and technology
Discuss
About the author
|
The above is the detailed content of Guidance on the correct use of regular expressions in PHP. For more information, please follow other related articles on the PHP Chinese website!