search
HomeBackend DevelopmentPHP TutorialDetailed explanation of regular expressions
Detailed explanation of regular expressionsJul 01, 2019 pm 04:25 PM
regular expression

Detailed explanation of regular expressions

The regular expression language consists of two basic character types: literal (normal) text characters and metacharacters.

Related recommendations:
1. Regular expression syntax tutorial (including online testing tools)
2. PHP regular expression quick introduction video tutorial

Metacharacters have the ability to be processed using regular expressions. Metacharacters can be any single character placed in [ ] (for example, [a] means matching a single lowercase character a), or a sequence of characters ( For example, [a-d] means matching any character between a, b, c, d, and \w means any English letters, numbers and underscores), Common metacharacters are as follows:

Common metacharacters

is the opposite of ~ matches any whitespace character, equivalent to ~ is the opposite of ~ matches any single digit between 0 and 9, equivalent to ~##\D\d##[\u4e00-\u9fa5] (the Chinese characters represented by \b~^ when placed before the first character of the brackets, it becomes $~Regular expression qualifier
Characters Description Special instructions
. Matches any character except the newline character (\n) ~
[abcde] matches any character among a b c d e All characters are or . The relationship
[a-h] matches a to Any character between h ~
[^fgh] does not match Any character in fgh matches . Add ^ before the first character of the square brackets [ ] to indicate negation
Does not match any characters appearing inside square brackets
\w Matches uppercase and lowercase English characters and numbers 0 to 9 Any one between and the underscore is equivalent to [a-zA-Z0-9_] ~
##\W \w and is equivalent to [^a-zA-Z0-9_]
\s [\f\n\r\t\v]
\S \s, equivalent to [^\s]
\d [0-9]
# is the opposite of , equivalent to [^0-9]~
Matches any single Chinese character (Chinese) Unicode encoding are used here) ~
Matches the beginning or end of a word
Matches the beginning of the string which means inverse
Match the end of the string
Function: Limit the number of occurrences of the

unit

preceding this symbol.

Unit:
If the preceding character is a character, then this one The character is a

unit
  1. If we used parentheses to enclose a long string before, then the entire parentheses are considered a unit
  2. The above metacharacters are all matched against a single character. If you want to match multiple characters at the same time, you need to use qualifiers. The following are some common qualifiers (
  3. n in the table below
and m both represent

integer. )

CharactersDescriptionSpecial Instructions*?{n}~{n, }~{n,m}~\b ~^~$~

Explanation - Special case

  1. You can surround multiple metacharacters or literal text characters with brackets to form a group, such as ^(13)[4-9]\d{8}$ represents any mobile phone number starting with 13.
    1. abcabcabc represents the last letter c appearing 1 or more times;
    2. (abcabcabc) represents the entire stringabcabcabc appears 1 or more times.
  2. You can use | to indicate the relationship of or , for example, z|j|q indicates matching Any letter among z, j, q. In fact, it is equivalent to [zjq].
    1. ab|cd|ef means: either ab, cd or ef.
    2. a(b|cd|e)f means: starting with a, either b or cdEither e, ending with f.
    3. Summary: The only boundary of | ( or ) is parentheses (( ))
  3. [0-9A-Z.?] How do you understand this regular rule?
    1. When . and ? appear in square brackets , . and ? Will become normal characters, which are dots and question marks. You can understand that the priority of [ ] is greater than the priority of . and ?.
    2. This regular expression will exactly match the string ?aaa.bbb, remember here . and ? are completely treated as ordinary characters.

Advanced 1 - Multi-selection structure

The multi-selection structure is actually the use of metacharacters | (or).
Defining range: beginning, end, parentheses

matches 0 to multiple metacharacters, equivalent to {0,} ~
matches 0 to 1 metacharacter, equivalent to {0,1} ~
matches at least 1 metacharacter, equivalent to {1,} ~
Match n metacharacters
Match at least n metacharacters
Match n to m metacharacters
Match word boundaries
The string must start with the specified character
The string must end with the specified character
Regular Meaning
Windows98|Windows2000|WindowsXP matches Windows98 or Windows2000 or WindowsXP
^Windows98|Windows2000|WindowsXP$ Starts with Windows98 or contains Windows2000 or ends with WindowsXP
Note that ^ and $ are both included in the range of |, because the boundaries of | are only: beginning, end, parentheses
Windows(98|2000|XP) Windowsthen98 or 2000 orXP

Summary: The multi-selection structure can include many characters, but it cannot exceed the boundaries of brackets.

Advanced 2 - Grouping and Backreferences

Grouping

  • We already know how to repeat a single character;
  • But if you want to What should I do if I want to repeat a string? You can use parentheses to specify subexpressions (also called groupings) .
  • (\d{1,3}\.){3}\d{1,3} Simple IP address matching expression
  • But it will also Matches the impossible IP address 256.300.888.999. Can you write a more accurate regex?
  • ((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4] \d|25[0-5]|[01]?\d\d?)

Backreference

  • Use parentheses to specify a sub After an expression (grouped), text matching this subexpression can be captured for further processing within the expression or other programs.
  • By default, each group will automatically have a group number. The rules are: With the left bracket of the group as the symbol, from left to right, the first group number The group number is 1, the second one is 2, and so on.

Example:

  • ##\b(\w )\b\s \1\b can be used to match duplicates The word
  • matches words such as:
  • where where go, tom tom happy
## Straightforward explanation:

in the regular expression , use parentheses in the front to divide (group), and then put the content matched by the parentheses and quote
to the back, using \1, \2, etc. To represent. (The first parenthesis is \1...). If there are parentheses nested inside parentheses (\w (.?)) Remember: At this time, you need to use ( as the symbol to count the parentheses from left to right. .Advanced 3 - Look Around (Zero Width Assertion)

Look around does not match any characters, only
    specific positions in the text
  • . Similar to \b, ^, $ like that. Looking around will not occupy characters. Looking around is divided into
  • order
  • There are two kinds of reverse order: order
      • (?=exp)
      • The following position can match exp . For example: (?=\d) The right side of the current position is a number.
      • (?!exp)
      • The following of the position cannot be matched exp. For example: (?!\d) The right side of the current position is not a number.
      Reverse order
      • (? The in front of the position can match exp. For example: (? To the left of the current position It is a number <code>
    • (?. The in front of the position cannot match exp. For example: (?!\d ) The left side of the current position is not a number.
Advanced 4 - Greed and Lazy

When the regular expression When it contains
    quantifier
  • (a specified number of codes, such as , *, {3,12}, etc.) that can be repeated , The usual behavior is to match as many characters as possible . Regular expression:
  • a.*b
  • , it will match the longest character ending with a# A string starting with ## and ending with b. If you use it to search for aabab, it will match the entire string aabab, which is called -- -----Greedy matching-
  • We need more
  • Lazy matching
  • , that is, matching as few characters as possible, as given above All quantifiers can be converted into lazy matching patterns.
  • Just add a question mark after it ? . In this way, .*? means matching any number of repetitions , but use the least repeated under the premise that the entire can be matched successfully. a.*?b
  • matches the shortest one, starting with
  • a , a string ending with b. If applied to aabab, it will match aab and ab. Summary:

The difference between greedy and lazy mode is:

Lazy mode
is behind the quantifier

* There is one more question mark ?. Advanced 5 - Priority of pattern matching

When using regular expressions, you need to pay attention to the order of matching. Usually the same priority

is calculated from left to right

, and operations with different priorities

are higher first and then lower . The matching order priority of various operators is from high to low as shown in the following table.

##5Pattern selection
Order Metacharacters Description
1 \ Escape characters
2 ()(?:)(?=)[] Mode units and atom tables
3 *, ,? {n}{n,}{n,m} Duplicate match
4 ^$\b\B\A\Z Border restrictions
|
Example

1. Character escape

1Q: To match the string

333333\$33\ How should the \$ in 33333 be written? 2 Question: If the
preg_match function in PHP uses the expressions of single quotes and double quotes to match the above \$,how to write?

Answer:

    The rule required for the expression is
  • \\\$
  • Use single quotes to express the above The string
  • '/\\\\\\$/'. (For the convenience of viewing, we split it into '/\\ \\ \\ $/')
  • Use double quotes to represent the above string
  • "/\\\\ \\\$/". (For the convenience of viewing, we split it into "/\\ \\ \\ \$/")
  • What are you asking?

Another answer:

    Single quotes in PHP do not escape any characters, but only escape
  1. \, So we need 6 \ to generate the expression.
  2. In addition to escaping

    \, double quotes also need one more \ to escape $, so it requires 7 \.

Recommended related tutorials:

PHP video tutorial

The above is the detailed content of Detailed explanation of regular expressions. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
php怎么去除字符串中的所有大写字母php怎么去除字符串中的所有大写字母Sep 26, 2022 pm 07:59 PM

两种去除方法:1、利用preg_replace()执行正则表达式搜索所有大写字母并将其替换为空字符即可,语法“preg_replace('/[A-Z]/','',$str)”。2、利用preg_filter()执行正则表达式搜索所有大写字母并将其替换为空字符即可,语法“preg_filter('/[A-Z]/','',$str)”。

php怎么替换nbsp空格符php怎么替换nbsp空格符Apr 24, 2022 pm 02:55 PM

方法:1、用“str_replace("&nbsp;","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\&nbsp\;||\xc2\xa0)/","其他字符",$str)”语句。

使用Go语言编写高性能的正则表达式匹配使用Go语言编写高性能的正则表达式匹配Jun 15, 2023 pm 10:56 PM

随着数据量的不断增大,正则表达式匹配成为了程序中常用的操作之一。而在Go语言中,由于其天然的并行ism,以及与底层系统的交互性和高效性,使得Go语言的正则表达式匹配极具优势。那么如何使用Go语言编写高性能的正则表达式匹配呢?一、了解正则表达式在使用正则表达式前,我们首先需要了解正则表达式,了解其基本语法规则以及常用的匹配字符,使我们能够在编写正则表达式时更加

php怎么利用正则排除字符串中的字符php怎么利用正则排除字符串中的字符Dec 15, 2022 pm 03:30 PM

两种方法:1、用preg_replace(),可执行正则表达式的搜索和替换,只需将字符串中匹配的字符替换为空字符即可,语法“preg_replace(正则, "", $str)”。2、用preg_match_all(),可搜索字符串中所有和正则表达式匹配的结果,会将每次的匹配结果放在一个数组$array中,语法“preg_match_all(正则,$str,$array);”。

javascript怎么正则替换非汉字的字符javascript怎么正则替换非汉字的字符Oct 13, 2022 pm 05:37 PM

在javascript中,可以使用replace()函数配合正则表达式“/[u4e00-u9fa5|,]+/ig”来查找字符串中的所有非汉字字符,并将其替换为其他指定值,语法“字符串对象.replace(/[u4e00-u9fa5|,]+/ig,'指定替换值')”。

php怎么只获取中文字符php怎么只获取中文字符Apr 28, 2022 pm 08:15 PM

php中可用preg_match_all()配合正则表达式过滤字符串,只获取中文字符;语法“preg_match_all("/[\x{4e00}-\x{9fff}]+/u","$str",$arr);”,会将匹配字符存入“$arr”数组中。

Java语言正则表达式的使用方法Java语言正则表达式的使用方法Jun 10, 2023 am 08:13 AM

Java语言正则表达式的使用方法正则表达式是一种强大的文本处理工具,可以用来匹配和验证文本。在Java语言中,也可以使用正则表达式来实现字符串的匹配和处理。本文将介绍Java语言正则表达式的使用方法,涵盖正则表达式的基础知识,常用的正则表达式语法,以及在Java程序中使用正则表达式的方法。一、基础知识正则表达式是什么?正则表达式是一种文本模式,用来描述一组字

PHP开发:如何编写高效的正则表达式PHP开发:如何编写高效的正则表达式Jun 15, 2023 pm 09:04 PM

在PHP开发中,正则表达式是非常重要的工具,用于匹配、查找和替换文本中的特定字符串。然而,编写高效的正则表达式并不是一件易事,需要开发者具备一定的技巧和经验。下面是一些可以帮助您编写高效正则表达式的技巧:1.尽可能使用非贪婪匹配默认情况下,正则表达式是贪婪的,即它们将尽可能匹配更多的文本。在某些情况下,可能需要使用非贪婪匹配来避免这种情况。非贪婪匹配使用"

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),