Home >Backend Development >PHP Tutorial >Guidance on the correct use of regular expressions in PHP

Guidance on the correct use of regular expressions in PHP

巴扎黑
巴扎黑Original
2017-08-09 14:35:261400browse
Regular expressions provide a powerful way to process text. Using regular expressions, you can perform complex validation of user input, parse user input and file contents, and reformat strings. PHP provides users with an easy way to use POSIX and PCRE regular expressions. This tutorial discusses the differences between POSIX and PCRE and explains how to use regular expressions with PHP V5.

Before you begin

Learn what you can learn from this tutorial and how to best use it this tutorial.

Guidance on the correct use of regular expressions in PHPAbout this tutorial

Regular expressions provide a powerful way to process text. Using regular expressions, you can perform complex validation of user input, parse user input and text content, and reformat strings.

Guidance on the correct use of regular expressions in PHPObjective

This tutorial will focus on simple methods using POSIX and PCRE regular expressions. Make you proficient in PHP's regular expressions. We'll explore the differences between POSIX and PCRE, and also cover how to use regular expressions with PHP V5. By following this tutorial, you'll learn how, when, and why to use regular expressions.

Guidance on the correct use of regular expressions in PHPSystem Requirements

You can install PHP on any Microsoft® Windows® or Complete this tutorial on a UNIX® system, including Mac OS X and Linux®. Since what we introduce are all PHP built-in plug-ins, you only need to install PHP in your system, and there is no need to install other software.

Start

Guidance on the correct use of regular expressions in PHPWhat is a regular expression?

#A few years ago I did some interesting experiments with input fields in web forms. The user will enter a phone number into this form. This phone number will then be printed on the user's ad as they typed it. As required, U.S. phone numbers can be entered in several ways: either (555) 555-5555 or 555-555-5555, but 555-5555 is not accepted.

You may be wondering, why don't we throw away all non-numeric characters and only ensure that the total number of remaining characters is 10? This approach does work, but it doesn't prevent the user from typing something like !555?333-3333.

From a web developer's perspective, this situation presents an interesting challenge. I could write routines to check for a variety of different formats, but I'd like to find a solution that allows some flexibility if the user then accepts a format like 555.555.5555.

This is exactly where regular expressions (regex for short) are applicable. I've cut and pasted them into the app before and never found any incomprehensible syntax issues. Regex looks a lot like mathematical expressions. When you see an expression of the form 2x2=4, you usually think "2 times 2 equals 4." Regular expressions are very similar. After reading this article, when you see a regular expression like ^b$, you will tell yourself: "The beginning of a line is b, and then the end of the line." Not only that, you will realize how easy it is to use regular expressions in PHP.

Guidance on the correct use of regular expressions in PHPWhen to use regex

You should use regex to complete search and replace operations when there are rules to follow, but you do not necessarily have to find or replace the exact characters. For example, in the phone number example mentioned above, the user defines rules that indicate the format of the entered phone number, but not the digits contained in the phone number. This also applies to scenarios with large amounts of user input. U.S. state abbreviations are limited to two uppercase letters from A to Z. Regular expressions can also be used here, allowing you to simply limit text in a form or user input to letters of the alphabet, regardless of case or length issues.

Guidance on the correct use of regular expressions in PHPWhen not to use regex

Regular expressions are powerful, but they also have some flaws. One of them requires skills in reading and writing expressions. If you decide to include regular expressions in your application, they should be fully commented. This way, if someone else needs to change the expression later, they can do so without disrupting functionality. Additionally, if you are not familiar with using regular expressions, you may find them difficult to debug.

To avoid these difficulties, don't use regular expressions when simpler built-in functions solve the problem well enough.

POSIX and PCRE

PHP supports two regular expression implementations: Portable Operating System Implementation (POSIX) and Perl-Compatible Regular Expression (PCRE) . The two implementations offer different features, but they are equally simple to use in PHP. The regex style you use depends on your past experience and usage habits with regex. There is some evidence that PCRE expressions are slightly faster than POSIX expressions, but in the vast majority of applications this difference is not that significant.

In the examples in this article, the syntax of each regex method is included in the comments. In function syntax, regex is the regex parameter and the string being searched is string. The parameters in parentheses are optional. Since this tutorial mainly introduces the basic content, it will not give an introduction to all optional parameters.

Regular Expression Syntax

Although POSIX and PCRE implementations differ in their support for certain features and character classes, they The syntax is the same. Each regular expression is composed of one or more characters, special characters (sometimes also called metacharacters), character classes, and character groups.

POSIX and PCRE use the same wildcards - wildcards in regex to mean "anything here". The wildcard character is an English period or period (.). To find English periods or dots, use the escape characters /: /.. The same goes for other special characters discussed below, such as line anchors and qualifiers. If a character has a special meaning in a regular expression, it must be escaped to express its original literal meaning.

Line anchors are special metacharacters that match the beginning and end of a line but do not capture any text (see Table 1). For example, if a line begins with the letters a, then the line anchor in the expression ^a does not capture the letters a but instead matches the beginning of the line.


Guidance on the correct use of regular expressions in PHPTable 1. Row anchors

##AnchorDescription Matches the beginning of a line Matches the end of a line
^
$

The qualifier applies to the expression immediately preceding it (see Table 2). Using qualifiers, you can specify the number of times an expression is found in a search. For example, the expression a+ will find the letters a one or more times.


Guidance on the correct use of regular expressions in PHPTable 2. Qualifiers

QualifierDescription The expression before the qualifier can be searched 0 or 1 timesThe expression before the qualifier can be found one or more timesThe expression before the qualifier can be searched any number of times (including 0 times) Qualification The expression before the qualifier can only be searched n times The expression before the qualifier can be searched Between n and m times
?
+
*
{n}
{n,m}
Capturing text and referencing it in replace and search operations is a very useful feature in regex (see Table 3 ). By using the capture feature, you can perform searches to find duplicate words and closing HTML and XML tags. If you use capture when replacing, you can place the retrieved text into the replacement string. An example will be given later showing how to replace an email address with a hyperlink.


Guidance on the correct use of regular expressions in PHPTable 3. Grouping and capturing

Character class Descriptiongroups characters and is able to capture the text

Guidance on the correct use of regular expressions in PHPPOSIX Character Classes

POSIX regular expressions follow a number of standards that make them usable by many regex implementations (see Table 4). For example, if you are writing a POSIX regular expression, you can use it in PHP, you can use it through the grep command, or you can use it through many editors that support regular expressions.


Guidance on the correct use of regular expressions in PHPTable 4. POSIX character classes

()
Character Description
[:alpha:] Matches characters containing letters and numbers
[:digit:] matches any digit
[:space:] matches any blank

POSIX Match

There are two search strings using POSIX regular expressions Functions, namely ereg() and eregi().

Guidance on the correct use of regular expressions in PHPereg()

ereg() method searches a string for a specific regular expression. If no match is found, 0 is returned, so you can give a test like this:


Guidance on the correct use of regular expressions in PHPListing 1. ereg() method

<?php
$phonenbr="555-555-5555";
// Syntax is ereg( regex, string [, out_captures_array])
if (ereg("[-[:digit:]]{12}", $phonenbr)) {
    print("Found match!/n");
} else {
    print("No match found!/n");
}
?>

Regular expression[-[:digit:]]{12} Find 12 characters that are numbers or hyphens. This is a bit crude as far as handling phone numbers go, you could also rewrite it like this: ^[0-9]{3}-[0-9]{3}-[0-9]{ 4}$. (In regex, [0-9] and [:digit:] are effectively the same, you might prefer to use [0-9] because it is shorter. ) This alternative expression is obviously more precise. It looks for the beginning of a line (^), followed by a set of 3 numbers ([0-9]{3}), a hyphen (-), another set of 3 digits, another hyphen, a set of 4 digits, and then the end of the line ($). When you write expressions by hand, this gives you an idea of ​​the complexity of the problem the regular expression is trying to handle, which can help you predict the types of data you'll be searching for or replacing with the expression.

Guidance on the correct use of regular expressions in PHPeregi()

eregi() The method is similar to ereg(), the difference is It is not case sensitive. It will return an integer containing the length of the match found, but you will most likely use it in a conditional statement like this:


Guidance on the correct use of regular expressions in PHP Listing 2. eregi () Method

##
<?php
$str="Hello World!";
// Syntax is ereg( regex, string [, out_captures_array])
if (eregi("hello", $str)) {
    print("Found match!/n");
} else {
    print("No match found!/n");
}
?>

When this example is executed, the output

Found match! because hello was found in a case-ignoring search. If you are using ereg, the search will fail.

POSIX Replacement

##ereg_replace()

and eregi_replace() These two methods are used Used for replacement in text, it has the characteristics of POSIX regular expressions.

ereg_replace()Guidance on the correct use of regular expressions in PHPYou can use the

ereg_replace()

method to perform case-sensitive replacement in POSIX regular expression syntax. The following example describes how to replace an email address within a string with a hyperlink:


Listing 3. ereg_replace() methodGuidance on the correct use of regular expressions in PHP

<?php
$origstr = "My e-mail address is: first.last@example.com";
// Syntax is: ereg_replace( regex, replacestr, string )
$newstr = /
ereg_replace("([.[:alpha:][:digit:]]+@[.[:alpha:][:digit:]]+)", 
    "<a href=/"mailto://1/">//1</a>", $origstr);
print("$newstr/n");
?>
This is an incomplete version of a regular expression for matching email addresses, but It demonstrates the power of
ereg_replace()

compared to other ordinary replacement functions such as str_replace(). When using regular expressions, you define rules for searching instead of literal characters.

eregi_replace()Guidance on the correct use of regular expressions in PHPThe

eregi_replace()

function is the same as ereg_replace()## except that case is ignored. # are exactly the same:

Listing 4. eregi_replace() function
Guidance on the correct use of regular expressions in PHP

##
<?php
$origstr = "1 BANANA, 2 banana, 3 Banana";
// Syntax is: eregi_replace( regex, replacestr, string )
$newstr = eregi_replace("banana", "pear", $origstr);
print("New string is: &#39;$newstr&#39;/n");
?>

本例将 banana 替换为 pear,替换操作忽略了大小写。

PCRE 字符类

由于 PCRE 语法支持更短的字符类和更多的特性,因此它比 POSIX 语法更为强大。表 5 列出了 PCRE 中支持而在 POSIX 表达式中没有的部分字符类。


Guidance on the correct use of regular expressions in PHP表 5. PCRE 字符类

字符类 描述
/b 词边界,查找词的开始和结尾
/d 匹配任意数字
/s 匹配任意空白,如 tab 或空格
/t 匹配一个 tab 字符
/w 匹配包含字母与数字的字符

Guidance on the correct use of regular expressions in PHPPCRE 匹配

PHP 中的 PCRE 匹配函数与 POSIX 匹配函数类似,但如果您习惯使用 POSIX 表达式,那么 PCRE 匹配函数的一项特性可能会使您感到棘手:PCRE 函数要求表达式以分隔符开始和结束。在绝大多数示例中,分隔符都是一个 /,可在引号内表达式的开始和结尾处看到。务必牢记,此分隔符并非表达式的一部分。

在 PCRE 中的最后一个分隔符后,您可添加一个修饰符来更改正则表达式的行为。举例来说,i 修饰符使正则表达式对大小写不敏感。这是与 POSIX 方法的一项重要差异,在 POSIX 中,您需要按照对大小写敏感性的需求来调用不同的方法。

Guidance on the correct use of regular expressions in PHPpreg_grep()

preg_grep() 方法返回一个数组,其中包含通过正则表达式在其中找到匹配项的另外一个数组的全部项目。如果您有一个较大的值集,并希望对其进行搜索以查找匹配项,那么该方法非常有用。下面是一个示例:


Guidance on the correct use of regular expressions in PHP清单 5. preg_grep() 方法

<?php
$array = array( "1", "3", "ABC", "XYZ", "42" );
// Syntax is preg_grep( regex, inputarray );
$grep_array = preg_grep("/^/d+$/", $array);
print_r($grep_array);
?>

在本例中,正则表达式 ^/d+$ 查找行的开始(^)和结尾($)之间包含一个或多个数字(/d+)的数组的所有元素。

Guidance on the correct use of regular expressions in PHPpreg_match()

preg_match() 函数使用 PCRE 在字符串中查找匹配项,它需要两个参数:regex 和字符串。您可以选择提供一个将由匹配项填充的数组、允许您修改匹配操作行为的标志,还可提供字符串中开始查找匹配项的位置(offset)。示例如下:


Guidance on the correct use of regular expressions in PHP清单 6. offset 方法

<?php
$string = "abcdefgh";
$regex = "/^[a-z]+$/i";
// Syntax is preg_match(regex, string, [, out_matches [, flags [, offset]]]);
if (preg_match($regex, $string)) {
    printf("Pattern &#39;%s&#39; found in string &#39;%s&#39;/n", $regex, $string);
} else {
    printf("No match found in string &#39;%s&#39;!/n", $string);
}
?>

本例使用了正则表达式 ^[a-z]+$,在行的开始(^)和结尾($)之间搜索可查找到一次或多次的([a-z]+)、从 az 的任意字母。

Guidance on the correct use of regular expressions in PHPpreg_match_all()

preg_match_all() 函数为在字符串中查找到的全部匹配项构建一个数组。下例构建了一个包含句子中全部词的数组:


Guidance on the correct use of regular expressions in PHP清单 7. preg_match_all() 函数

<?php
$string = "The quick red fox jumped over the lazy brown dog";
$re = "//b/w+/b/";
// Syntax is preg_match_all( regex, string, return_array [, flags [, offset]])
preg_match_all($re, $string, $arrayout);
print_r($arrayout);
?>

正则表达式 /b/w+/b 在词边界(/b)间查找可找到一次或多次的(/w+)单词字符。每个词都将置入输出数组 $arrayout 的一个数组元素中。

PCRE 替换

在 PHP 中进行 PCRE 替换与 POSIX 替换类似,不同之处在于使用的是 preg_replace() 而非 ereg_replace()eregi_replace()

Guidance on the correct use of regular expressions in PHPpreg_replace()

preg_replace() 函数使用 PCRE 进行替换。它需要这样几个参数:正则表达式、替换表达式和原始字符串。您还可以选择提供希望的最大替换数,以及以所完成的替换数填充的变量。示例如下:


Guidance on the correct use of regular expressions in PHP清单 8. preg_replace() 函数

<?php
$orig_string = "5555555555";
printf("Original string is &#39;%s&#39;/n", $orig_string);
$re = "/^(/d{3})(/d{3})(/d{4})$/";
// Syntax is preg_replace( regex, replacement, string /
[, limit [, out_count]] );
$new_string = preg_replace($re, "(//1) //2-//3", $orig_string);
printf("New string is &#39;%s&#39;/n", $new_string);
?>

This example quickly demonstrates the method of capturing part of the text and using backreference, such as //1. These backreferences are inserted into any text matched within parentheses, in this case, //1 matches group 1 (/d{3}).

In the example, you can use substr to split the phone number, and only make small changes to the string, relying on substr to capture reliably Correct text will be more difficult.

If the string can be in the form (555)5555555, you can modify the expression to ^(?(/d{3}))?(/d{ 3})(/d{4})$ to find any parentheses.

Conclusion

PHP provides two syntaxes for regular expressions: POSIX and PCRE. This tutorial provides a high-level overview of the main functions in PHP that support POSIX and PCRE regular expressions.

Using regular expressions, you can define rules to perform more powerful search and replace operations - going well beyond text search and replace.

Guidance on the correct use of regular expressions in PHPReference materials

##Learning

  • You can refer to the

    English original text of this article on the developerWorks global site.

  • Regular-Expressions.info Provides information about regular expressions.

  • PHP: Regular Expression Functions (Perl-Compatible) - Manual is a PHP online document covering PCRE-related content.

  • ##Regular Expression Functions (POSIX Extended)

    is the PHP online documentation about POSIX.

  • Visit the
  • PHP project resources

    on developerWorks for more information about PHP.

  • For developerWorks tutorials on learning to program in PHP, please refer to "
  • Learning PHP, Part 1

    ", Part 2 and Part 3.

  • Learn about the latest
  • developerWorks technical events and Webcast

    .

  • Visit developerWorks
  • Open Source Zone

    to get a lot of how-to information, tools and project updates to help you use open source technology Developed and used with IBM products.

Get products and technology

    Through
  • PHP.net

    Download the latest version of PHP.

  • Regular Expression Library

    There is a large repository of regular expressions.

  • Order the free SEK for Linux

    , which are two DVDs that contain IBM's latest trial software on the Linux platform, Includes DB2®, Lotus®, Rational®, Tivoli® and WebSphere®.

  • Innovate your next open source project with
  • IBM trial software

    , available as a download or on DVD.

Discuss

    Join the developerWorks community by participating in
  • developerWorks blogs

    .

Guidance on the correct use of regular expressions in PHPAbout the author

PHP 5 Recipes: A Problem-Solution Approach

Nathan A. Good is an author, software engineer, and systems administrator in Twin Cities, Minnesota. His books include (Apress, 2005),

Regular Expression Recipes for Windows Developers: A Problem- Solution Approach (Apress, 2005), Regular Expressions: A Problem-Solution Approach (Apress, 2005), and Professional Red with Kapil Sharma et al. Hat Enterprise Linux 3 (Wrox Publishing, 2004).

The above is the detailed content of Guidance on the correct use of regular expressions in PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn