Detailed explanation of what regular expressions are and their usage-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

Detailed explanation of what regular expressions are and their usage

阿神

Mar 28, 2017 pm 02:54 PM

regular expression

1. What is a regular expression?

Regular expression (regular expression) describes a string matching pattern, which can be used to: contain Matches a certain

(1) Check whether a string contains a string that matches a certain rule, and the string can be obtained;

(2) Flexibly perform string processing based on matching rules replacement operation.

Regular expressions are actually very simple to learn, and a few more abstract concepts are also easy to understand. The reason why many people feel that regular expressions are complicated is that, on the one hand, most documents do not explain them from the shallower to the deeper, and do not pay attention to the order of concepts, which makes it difficult to understand; on the other hand, various engines The documentation that comes with it usually introduces its unique functions, but these unique functions are not the first thing we need to understand.

##2 .How to use regular expressions

2.1 Ordinary characters

Letters, numbers, Chinese characters, underscores, As well as punctuation marks that are not specially defined in the following chapters, they are all ordinary characters. Ordinary characters in an expression, when matching a string, match the same character.

Example 1: Expression c, when matching the string abcdef, the matching result is: success; the matched content is: c; the matched position is: starting at 2 and ending at 3. (Note: Whether the subscript starts from 0 or 1 may differ depending on the current programming language).

Example 2: Expression bcd, when matching the string abcde, the matching result is: success; the matched content is: bcd; the matched position is: starting at 1 and ending at 4.

2.2 Simple escape characters

For some characters that are inconvenient to write, use the method of adding \ in front. In fact, we are all familiar with these characters.

Detailed explanation of what regular expressions are and their usage

There are other punctuation marks that have special uses in later chapters. Add \ in front to represent the symbol itself. For example: ^ and $ have special meanings. If you want to hide the ^ and $ characters in the string, the regular expressions need to be written as \^ and \$.

Detailed explanation of what regular expressions are and their usage

The matching method of these escape characters is similar to that of ordinary characters. Also matches the same character.

Example: Expression \$d, when matching the string abc$de, the matching result is: success; the matched content is: $d; the matched position is: starting at 3 and ending at 5.

2.3 Expressions that can match 'multiple characters'

Some expression methods in regular expressions can match multiple any one of these characters. For example, the expression \d can match any number. Although it can match any of the characters, it can only be one, not multiple. This is just like when playing poker, the king can replace any card, but the jackpot can replace one card.

Detailed explanation of what regular expressions are and their usage

Example 1: Expression \d\d, when matching abc123, the matching result is: success; the matched content is: 12; the matched position is: Starts at 3 and ends at 5.

Example 2: Expression a.\d, when matching aaa100, the matching result is: success; the matched content is: aa1; the matched position is: starting at 1, ended in 4.

2.4 Custom expressions that can match 'multiple characters'

Use square brackets [] to include a series of characters that can match them any character. Use [^] to include a series of characters, and it can match any character except the characters among them. In the same way, although any one of them can be matched, it can only be one, not multiple.

Detailed explanation of what regular expressions are and their usage

Example 1: When the expression [bcd][bcd] matches abc123, the matching result is: success; the matched content is: bc; the matched position is : Starts at 1 and ends at 3.

Example 2: When the expression [^abc] matches abc123, the matching result is: success; the matched content is: 1; the matched position is: starting at 3 and ending at 4.

2.5 Special symbols that modify the number of matches

The expressions mentioned in the previous chapter, whether they are expressions that can only match one type of character or expressions that can match multiple characters, can only be matched once. If you use an expression plus a special symbol that modifies the number of matches, you can match repeatedly without writing the expression again.

The usage method is: put the "number of times modification" after the modified expression. For example: [bcd][bcd] can be written as [bcd]{2}.

Detailed explanation of what regular expressions are and their usage

Example 1: When the expression \d+/.?\d* matches it costs $12.5 , the matching result is: success; the matched content is: 12.5 ; The matched positions are: starting at 10 and ending at 14.

Example 2: When the expression go{2, 8}gle matches Ads by goooooogle, the matching result is: success; the matched content is: goooooogle; the matched position is: starting at 7, Ended at 17.

2.6 Some other symbols representing abstract meanings

Some symbols represent abstract special meanings in expressions:

Detailed explanation of what regular expressions are and their usage

Further text explanation is still relatively abstract, so examples are given to help everyone understand.

Example 1: When the expression ^aaa matches xxx aaa xxx, the matching result is: failure. Because ^ is required to match the beginning of the string, ^aaa can only match when aaa is at the beginning of the string, such as: aaa xxx xxx.

Example 2: When the expression aaa$ matches xxx aaa xxx, the matching result is: failure. Because $ is required to match the end of the string, aaa$ can only match when aaa is at the end of the string, such as: xxx xxx aaa.

Example 3: Expression .\b. When matching @@@abc, the matching result is: success; the matched content is: @a; the matched position is: starting at 2 and ending at 4.

Further explanation: \b is similar to ^ and $. It does not match any character itself, but it requires it to be on both sides of the position in the matching result. One side is the \w range and the other side is the non-\w range. .

Example 4: When the expression \bend\b matches weekend, endfor, end, the matching result is: success; the matched content is: end; the matched position is: starting at 15 and ending at 18.

Some symbols can affect the relationship between subexpressions within an expression:

Detailed explanation of what regular expressions are and their usage

Example 5: The expression Tom|Jack matches the string I' m Tom,he is Jack, the matching result is: success; the matched content is: Tom; the matched position is: starting at 4 and ending at 7. When matching the next one, the matching result is: success; the matched The content is: Jack; the matched position is: starting at 15 and ending at 19.

Example 6: When the expression (go\s*)+ matches Let's go go go!, the matching result is: success; the matched content is: go go go; the matched position is: start On 6, ended on 14.

Example 7: When the expression ￥(\d+\.?\d) matches $10.9,￥20.5, the matching result is: success; the matched content is: ￥20.5; the matched position is : Starts at 6 and ends at 10. The content matched by obtaining the bracket range alone is: 20.5.

3. Some advanced usage of regular expressions

3.1 Greedy and non-greedy in the number of matches

Greedy mode:

When using modified matching times When using special symbols, there are several representation methods that can enable the same expression to match different times at the same time, such as: "{m, n}", "{m,}", ?, *, +, the specific number of matches depends on Depends on the matching string. This kind of repeated matching expression an indefinite number of times always matches as many times as possible during the matching process. For example, for the text dxxxdxxxd, the example is as follows:

Detailed explanation of what regular expressions are and their usage

It can be seen that when matching, \w+ always matches as many characters as possible that meet its rules. Although in the second example, it does not match the last d, it is also to make the entire expression match successfully. In the same way, expressions with * and "{m, n}" are matched as much as possible, and expressions with ? are also "matched" as much as possible, depending on whether they can match or not. This matching principle is called greedy mode.

Non-greedy mode:

Add the ? sign after the special symbol that modifies the number of matches, so that expressions with an indefinite number of matches can be matched as little as possible, and expressions that can be matched or not matched can be "unmatched" as much as possible. This matching principle is called non-greedy mode, also called reluctant mode. If there are fewer matches, the entire regular expression will fail to match. Similar to the greedy mode, the non-greedy mode will minimally match more to make the entire regular expression match successfully. For example, for the text "dxxxdxxxd":

Detailed explanation of what regular expressions are and their usage

##For more situations, examples are as follows:

Example 1: Expression (. *) matches the string

The result is: success; the matched content is:

the entire string , the in the expression will match the last in the string.

Example 2: In contrast, if the expression (.*) matches the same string in example 1, only

, when matching the next one again, you can get the second

3.2 Backreference\1,\2...

When the expression is matched, the expression engine will include parentheses () The string matched by the expression is recorded. When obtaining the matching result, the string matched by the expression contained in parentheses can be fired separately. This has been demonstrated many times in the previous examples. In practical applications, when a certain boundary is used to search and the content to be obtained does not include the boundary, parentheses must be used to specify the desired range. For example, the previous (.*?) .

In fact, "the string matched by the expression contained in parentheses" can not only be used after the matching is completed, but can also be used during the matching process. The part after the expression can refer to the previous "submatch in parentheses that has already matched the string". The reference method is \ plus a number. \1 refers to the string matched in the first pair of brackets, \2 refers to the string matched in the second pair of brackets... and so on. If a pair of brackets contains another pair of brackets, the outer brackets are sorted first. Number. In other words, which pair of left parentheses ( comes first, then this pair will be sorted first.

Example 1: The expression ('|")(.*?)(/1) is matching 'Hello', "World", the matching result is: success; the matched content is: 'Hello'. When matching the next one, it can match "World"

Example 2: Expression. (\w)\1{4,} When matching aa bbbb abcdefg ccccc 111121111 999999999, the matching result is: success; the matched content is: cccccc. When matching the next one, you will get 999999999. This expression requires \w. The characters in the range are repeated at least 5 times. Pay attention to the difference with \w{5,}

Example 3: Expression .*?/1> When matching , The matching result is: success. If and are not matched, the matching will fail; if it is changed to another pairing, the matching can also be successful.

##3.3 Preliminary. Search, no match; reverse pre-search, no matchIn the previous chapter, I talked about several special symbols that represent abstract meanings: ^, $, \b. One thing they have in common is that they do not match any characters themselves, but only add a condition to the "two ends of the string" or the "gap between characters". After understanding this concept, this section will continue to introduce another one. A more flexible method that adds conditions to "both ends" or "gaps"

Forward pre-search

: (?=xxxxx), (?!xxxxx)

Format: (?=xxxxx), in the matched string, the "gap" or "both ends" it is located in. The additional condition is: the right side of the gap must be able to match the expression of "xxxxx" . Because it is only used as an additional condition on this gap, it does not affect the subsequent expressions to actually match the characters after this gap. This is similar to \b , which does not match any characters by itself. \b just takes the characters before and after the gap and makes a judgment. It will not affect the subsequent expressions to actually match.

Example 1: When the expression Windows(?=NT|XP) matches Windows 98, Windows NT, and Windows 2000, it will only match Windows in Windows NT, and other Windows words will not be matched.

Example 2: The expression (\w)((?=\1\1\1)(\1))+ will match the first 4 of 6 f when matching the string aaa ffffff 9999999999 , can match 9 9 and the first 7. This expression can be interpreted as: if letters and numbers are repeated more than 4 times, the part before the last 2 digits will be matched. Of course, this expression does not need to be written like this, but it is only used for demonstration purposes.

Format: (?!xxxxx) , located on the right side of the gap, must not match the xxxxx part of the expression.

Example 3: When the expression ((?!\bstop\b).)+ matches fdjka ljfdl stop fjdsla fdj, it will match from the beginning to the position before stop. If there is no stop in the string, then Matches the entire string.

Example 4: When the expression do(?!\w) matches the string done, do, dog, it can only match do. In this example, using (?!\w) after do has the same effect as using \b.

Reverse pre-search: (?

The concepts of these two formats are similar to forward pre-search , the condition required for reverse pre-search is: the "left side" of the gap. The two formats respectively require that it must be able to match and must not be able to match the specified expression, instead of judging the right side. The same as "forward pre-search" in that they are an addition to the gap and do not match any characters themselves.

4. Other general rules

4.1 Rule 1

In expressions, you can use \xXX and \uXXXX to represent a character (X represents a hexadecimal number)

4.2 Rule 2

While the expressions \s, \d, \w, \b represent special meanings, the corresponding Capital letters indicate the opposite meaning

4.3 Rule 3

has special meaning in expressions, Summary of characters that need to add \ to match the character itself

4.4 Rule 4

Brackets () If you want the matching results not to be recorded for later use, you can use the (?:xxxxx) format.

Example 1: When the expression (?:(\w)\1)+ matches "a bbccdd efg", the result is "bbccdd". Matches within the bracket (?:) range are not logged, so (\w) is quoted using \1.

4.5 Rule 5

Introduction to commonly used expression attribute settings: Ignorecase, Singleline, Multiline, Global

Related articles:

How to use regular expressions to match parentheses in PHP

Summary on the use of common functions in PHP regular expressions

Simple code example of php regular expression matching Chinese characters

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the concept of a PHP session in simple terms.Apr 26, 2025 am 12:09 AM

PHPsessionstrackuserdataacrossmultiplepagerequestsusingauniqueIDstoredinacookie.Here'showtomanagethemeffectively:1)Startasessionwithsession_start()andstoredatain$_SESSION.2)RegeneratethesessionIDafterloginwithsession_regenerate_id(true)topreventsessi

How do you loop through all the values stored in a PHP session?Apr 26, 2025 am 12:06 AM

In PHP, iterating through session data can be achieved through the following steps: 1. Start the session using session_start(). 2. Iterate through foreach loop through all key-value pairs in the $_SESSION array. 3. When processing complex data structures, use is_array() or is_object() functions and use print_r() to output detailed information. 4. When optimizing traversal, paging can be used to avoid processing large amounts of data at one time. This will help you manage and use PHP session data more efficiently in your actual project.

Explain how to use sessions for user authentication.Apr 26, 2025 am 12:04 AM

The session realizes user authentication through the server-side state management mechanism. 1) Session creation and generation of unique IDs, 2) IDs are passed through cookies, 3) Server stores and accesses session data through IDs, 4) User authentication and status management are realized, improving application security and user experience.

Give an example of how to store a user's name in a PHP session.Apr 26, 2025 am 12:03 AM

Tostoreauser'snameinaPHPsession,startthesessionwithsession_start(),thenassignthenameto$_SESSION['username'].1)Usesession_start()toinitializethesession.2)Assigntheuser'snameto$_SESSION['username'].Thisallowsyoutoaccessthenameacrossmultiplepages,enhanc

What are some common problems that can cause PHP sessions to fail?Apr 25, 2025 am 12:16 AM

Reasons for PHPSession failure include configuration errors, cookie issues, and session expiration. 1. Configuration error: Check and set the correct session.save_path. 2.Cookie problem: Make sure the cookie is set correctly. 3.Session expires: Adjust session.gc_maxlifetime value to extend session time.

How do you debug session-related issues in PHP?Apr 25, 2025 am 12:12 AM

Methods to debug session problems in PHP include: 1. Check whether the session is started correctly; 2. Verify the delivery of the session ID; 3. Check the storage and reading of session data; 4. Check the server configuration. By outputting session ID and data, viewing session file content, etc., you can effectively diagnose and solve session-related problems.

What happens if session_start() is called multiple times?Apr 25, 2025 am 12:06 AM

Multiple calls to session_start() will result in warning messages and possible data overwrites. 1) PHP will issue a warning, prompting that the session has been started. 2) It may cause unexpected overwriting of session data. 3) Use session_status() to check the session status to avoid repeated calls.

How do you configure the session lifetime in PHP?Apr 25, 2025 am 12:05 AM

Configuring the session lifecycle in PHP can be achieved by setting session.gc_maxlifetime and session.cookie_lifetime. 1) session.gc_maxlifetime controls the survival time of server-side session data, 2) session.cookie_lifetime controls the life cycle of client cookies. When set to 0, the cookie expires when the browser is closed.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

4 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

Hot Tools

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Atom editor mac version download

The most popular open source editor

Notepad++7.3.1

Easy-to-use and free code editor

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Hot Topics

Where is the login entrance for gmail email?

7744

1643

1397

1291

1234