Regular Expressions (39)-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

Regular Expressions (39)

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 08, 2016 am 09:23 AM

applepcreperl

Introduction to regular expressions:

??Regular expression is a grammatical rule used to describe character arrangement and matching patterns. It is mainly used for pattern segmentation, matching, search and replacement operations of strings. The exact (text) matching we've used so far is also a regular expression.
??In PHP, a regular expression is generally a programmatic description of a text pattern composed of a combination of regular characters and some special characters (similar to wildcards).

??In PHP, regular expressions have three functions:

??matching, and are often used to extract information from strings.
??Replace matching text with new text.
??Split a string into a set of smaller chunks of information.
??A regular expression contains at least one atom

There are two sets of regular expression function libraries in PHP. The functions of the two are similar, but the execution efficiency is slightly different:

??One set It is provided by the PCRE (Perl Compatible Regular Expression) library. Functions named with the prefix "preg_";
??A set of functions provided by POSIX (Portable Operating System Interface of Unix) extensions. Use functions named with the prefix "ereg_";
?? One of the reasons for using regular expressions is that in typical search and replace operations, only exact text can be matched, and searches for dynamic text in objects have Difficult, even impossible.

Grammar rules for regular expressions

PCRE regular expression:
??PCRE stands for Perl Compatible Regular Expression, which means Perl compatible regular expression.
??PCRE comes from the Perl language, and Perl is one of the most powerful languages for string operations. The initial version of PHP was a product developed by Perl.
??PCRE syntax supports more features, is more powerful than POSIX syntax, implements the same functional functions, and has a slight advantage in using the PCRE library. But they also have a lot in common.
??In PCRE, the pattern expression (ie regular expression) is usually enclosed between two backslashes "/", such as "/apple/". Users only need to put the pattern content that needs to be matched between the delimiters. The delimiting characters are not limited to "/". Any character other than letters, numbers and slashes "" can be used as delimiters, such as "#", "|", "!", etc.

Atom (Atom)

Atom is the basic unit that makes up a regular expression. When analyzing a regular expression, it should be treated as a whole.
??Atomic characters include all English letters, numbers, punctuation marks and other symbols. Atoms also include the following.
??Single characters, numbers, such as a-z, A-Z, 0-9.
??Model units such as (ABC) can be understood as large atoms composed of multiple atoms.
??Atomic table, such as [ABC].
??Reused pattern units, such as: \1
??Common escape characters, such as: d, D, w
??Escape metacharacters, such as: *, .

Common escape characters

Atomic description
------------------------------------------------ --------------------------------
d Match a number; equivalent to [0-9]
D Match Any character except numbers; equivalent to [^0-9]
w　 Matches an English letter, number or underscore; equivalent to [0-9a-zA-Z_]
W Matches anything except English letters, Any character except numbers and underscores; equivalent to [^0-9a-zA-Z_]
s　matches a whitespace character; equivalent to [fnrtv]
S　matches any character except whitespace characters; etc. Equivalent to [^fnrtv]
f　 Match a form feed character equivalent to x0c or cL
n Match a newline character; equivalent to x0a or cJ
r　 Match a carriage return character equivalent to x0d or cM
t Matches a tab character; equivalent to x09 or cl
v Matches a vertical tab character; equivalent to x0b or ck
oNN Matches an octal number
xNN Matches a sixteenth Base number
cC Matches a control character

Meta-character (Meta-character)

Metacharacters are characters with special meaning used to construct regular expressions. If you want to include the metacharacter itself in the regular expression, you must add "" before it to escape
Metacharacter description
------------------ --------------------------------------------------
* 0 times, 1 time or more matches the atom before it
+ 1 or more times matches the atom before it
? 0 times or 1 time matches the atom before it
| Matches two or Multiple choices
^ 　Or A matches the atoms at the beginning of the string
$ 　Or Z matches the atoms at the end of the string
b 　matches the boundary of the word
B 　matches the part other than the boundary of the word
[] Matches any atom in square brackets
[^] Matches any character except the atoms in square brackets
{m} Indicates that the preceding atom appears exactly m times
{m,n} Indicates that its previous atom appears at least m times, and at least n times (n>m)
{m,} Indicates that its previous atom appears no less than m times
() Represents an atom as a whole
. Match and divide Any character except newline

String boundary restrictions

In some cases, the matching range needs to be limited to obtain more accurate matching results. "^" and "$" specify the start and end of the string respectively.
??For example, in the string "Tom and Jerry chased each other in the house until tom's uncelcome in"
??The metacharacter "^" or "A" is placed at the beginning of the string to ensure that the pattern match occurs At the beginning of the string;
/^Tom/
?? The metacharacter "$" or "Z" is placed at the end of the string to ensure that pattern matching occurs at the end of the string.
/in$/
??If you do not add boundary restriction metacharacters, you will get more matching results.
/^Tom$/Exact Match/Tom/Fuzzy Match

Word Boundary Limitation

When using the search function of various editing software, you can get more accurate results by selecting "Find by Word" . Similar functionality is available in regular expressions.
??For example: in the string "This island is a beautiful land" the
?? metacharacter "b" matches the word boundary;
/bisb/ matches the word "is", does not match "This" and "island".
/bis/ matches the word "is" and "is" in "island", but does not match "This"
?? The metacharacter "B" matches outside of word boundaries.
/BisB/ will explicitly indicate that it will not match the left or right boundaries of the word, only the inside of the word. So in this example there is no result.
/Bis/ matches the "is" in the word "This"

repeated matching

There are some metacharacters in regular expressions that are used to repeatedly match certain atoms: "?", "*" , "+". The main difference between them is the number of repeated matches.
??Metacharacter "?": Indicates 0 or 1 matching of the atom immediately preceding it.
For example: /colou?r/ matches "colour" or "color".
??Metacharacter "*": Indicates 0, 1 or more matches of the atom immediately preceding it.
For example: /zo*/ can match z, zoo
?? The metacharacter "+": indicates matching the atom immediately preceding it one or more times.
For example: /go+gle/ matches "gogle", "google" or "gooogle" and other strings containing multiple o's in the middle.

Any character

The metacharacter "." matches any character except newline.
?? Equivalent to: [^n] (Unix system) or [^rn] (windows system).
??For example: /pr.y/ can match the strings "prey", "pray" or "pr%y", etc.
??You can usually use the ".*" combination to match any character except newlines. In some books, it is also called "full match" or "single-inclusive match".
??For example:
??/^a.*z$/ means that it can match any string starting with the letter "a" and ending with the letter "z" that does not include a newline character.
??/.+/ can also complete a similar matching function, but the difference is that it matches at least one character.
??/^a.+z$/ matches "a%z" but does not match the string "az"

Atomic table - square bracket expression

The atom table "[]" stores a group of atoms, which are equal to each other and only match one of the atoms. If you want to match an "a" or "e" use [ae].
??For example: Pr[ae]y matches "Pray" or "Prey".
??The atom table "[^]" is also called the excluded atom table, matching any character except the atoms in the table.
??For example: /p[^u]/ matches "pa" in "part", but cannot match "pu" in "computer" because "u" is excluded from the match.
??The atom table "[-]" is used to connect a group of atoms arranged in ASCII code order to simplify writing.
??For example: /x[0123456789]/ can be written as x[0-9], which is used to match a string consisting of the letter "x" and a number.
??For example:
??/[a-zA-Z]/matches all uppercase and lowercase letters
??/^[a-z][0-9]$/matches such as "z2", " t6", "g7"
??/0[xX][0-9a-fA-F]/ matches a simple hexadecimal number, such as "0x9".
??/[^0-9a-zA-Z_]/ matches any character except English letters, numbers and underscores, which is equivalent to W.
??/0?[ xX][0-9a-fA-F]+/ matches hexadecimal numbers, which can match "0x9B3C" or "X800", etc.
??// can match "

", "" or "" HTML tags, and do not strictly control case.

Pattern selector

The metacharacter "|" is also called the pattern selector. Matches one of two or more choices in a regular expression.
??For example:
??In the string "There are many apples and pears.", /apple|pear/ matches "apple" when it is run the first time; it matches "pear" when it is run again. You can also continue to add options, such as: /apple|pear|banana|lemon/

Pattern unit

The metacharacter "()" turns the regular expression into an atom (or pattern unit). Similar to parentheses in mathematical expressions, "()" can be used as a unit alone.
??For example:
??/(Dog)+/ matches "Dog", "DogDog", "DogDogDog", because the atoms immediately before "+" are enclosed by metacharacters "()" The string "Dog".
??/You (very )+ old/matches "You very old", "You very veryold"
??/Hello (world|earth)/matches "Hello world", "Hello earth"
??Expressions in a pattern unit will be matched or evaluated first.

Reused pattern unit

The system automatically stores the matches in the pattern unit "()" in sequence, and can be referenced in the form of "1", "2", and "3" when needed. This method is very convenient for managing regular expressions when they contain the same pattern units. Note that you need to write "\1" and "\2" when using it
For example:
??/^d{2}([W])d{2}\1d{4}$/matches "12- 31-2006", "09/27/1996", "86 01 4321" and other strings. But the above regular expression does not match the format of "12/34-5678". This is because the result "/" of pattern "[W]" has already been stored. When the next position "1" is referenced, its matching pattern is also the character "/".
??Use the non-storage pattern unit "(?:)" when there is no need to store the matching results
??For example /(?:a|b|c)(D|E|F)\1g/ will Matches "aEEg". In some regular expressions, it is necessary to use non-storage mode units. Otherwise, the order of subsequent references needs to be changed. The above example can also be written as /(a|b|c)(C|E|F)\2g/.

The above has introduced regular expressions (39), including aspects of it. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

iOS 17下通话记录显示在另一部iPhone上的问题修复及四种阻止方式Nov 26, 2023 pm 08:07 PM

自从Apple向大众推出iOS17以来已经有一段时间了，虽然它带给我们的大部分东西都非常有用，但在过去的几个月里，一些问题一直困扰着用户。其中一个问题围绕着iPhone上的电话应用程序中显示的通话记录。一些用户表达了（1,2）他们对iPhone的通话记录也出现在另一部iPhone上这一事实感到不满。如果您遇到此问题，以下帖子应该可以帮助您了解为什么会发生这种情况以及如何解决它。CONTENTS[SHOW]显示为什么我的通话记录显示在iOS17上的另一部iPhone上？自iOS17发布以来，iPh

如何在 iPhone 上关闭 Apple TV 键盘提醒Nov 30, 2023 pm 11:41 PM

在AppleTV上，如果您不想使用AppleTVRemote输入文本，则可以使用附近的iPhone或iPad进行键入。每当AppleTV上出现文本字段时，iPhone或iPad上都会出现一条通知，轻点通知后，您可以使用iOS设备上的屏幕键盘在AppleTV上输入文本。如果您发现这些通知很烦人，您可以在iPhone或iPad上禁用它们（如果您家里有几台AppleTV和孩子，您就会明白我们的意思）。如果运行的是iOS/iPadOS15.1或更高版本，下面介绍如何禁用它们。在iPhone或iPad上启

beats中文叫什么牌子Mar 06, 2023 pm 12:13 PM

beats中文叫“节拍”，是Apple旗下的耳机品牌。Beats是美国的一家音讯录制、音频设备和消费电子设备品牌，其子公司的生产线主要集中于耳机及扬声器；Beats推出了一系列出色的消费级头戴式耳机、入耳式耳机以及扬声器，为新一代年轻人打开了领略优质声效之美的大门。

2 种从 iPhone 上的 Apple 地图中删除收藏夹的方法Feb 02, 2024 pm 04:03 PM

借助Apple地图，您可以方便地将经常光顾的地点或某人的家添加到“个人收藏”中。这样一来，您就能轻松访问这些地点，减少前往的点击次数。如果您收藏了很多地点，或者某个地点与您无关，您可以从“地图”应用的“收藏夹”列表中删除它们。在这篇文章中，我们将帮助您从iPhone的Apple地图上的收藏夹列表中删除位置。如何从iPhone上的Apple地图中删除收藏夹[2种方法]有两种方法可以将位置删除为Apple地图上的收藏夹。方法1：从“收藏夹”部分在AppleMaps上删除收藏夹的最简单方法是直接访问应

如何取消您的 Apple 订阅Oct 31, 2023 pm 11:13 PM

您可以在iPhone、iPad或Mac上轻松取消AppleOne（如AppleOne）订阅，以及第三方AppStore订阅。Apple提供了多种Apple设备所有者可以订阅的服务，包括AppleMusic、AppleTV+、AppleArcade、iCloud+、AppleNews+和AppleFitness+。它还以捆绑订阅的形式提供这些服务，称为AppleOne.Apple于2023年<>月对其多项服务进行了大幅提价，包括AppleTV+、AppleArcade和

如何修复您所在地区不可用的 Apple Music 歌曲Jul 17, 2023 pm 08:24 PM

为什么我的苹果音乐一直说音乐不可用？在继续补救措施之前，了解为什么AppleMusic显示通知音乐不可用至关重要。此问题有多种原因：位置限制–由于许可协议或版权问题，某些曲目或专辑可能会在您所在的地区被阻止。音乐的可访问性因地区而异，您所在的特定内容可能没有分发许可证。区域发布日期–音乐发行通常在许多国家/地区间隔开来。某些歌曲或专辑可能已经在特定国家/地区流通，但仍在等待在另一个国家/地区发行。订阅限制–如果您拥有AppleMusic会员资格，订阅计划的条件可能会影响您对特定歌曲或专辑的访问。

使用Vue.js和Perl语言开发高效的网络爬虫和数据抓取工具Jul 31, 2023 pm 06:43 PM

使用Vue.js和Perl语言开发高效的网络爬虫和数据抓取工具近年来，随着互联网的迅猛发展和数据的日益重要，网络爬虫和数据抓取工具的需求也越来越大。在这个背景下，结合Vue.js和Perl语言开发高效的网络爬虫和数据抓取工具是一种不错的选择。本文将介绍如何使用Vue.js和Perl语言开发这样一个工具，并附上相应的代码示例。一、Vue.js和Perl语言的介