


Personal understanding of regular expressions - lazy matching, regular expression matching
Problem description
Link to this article: http://www.hcoding.com/?p=130
When I first learn regular expressions, I have a question. For example: I need to match the characters between the first pair of "_" in the string "_abc_123_". When I first started learning regular expressions, I would write "/_w*_/", the matching result is "abc_123" instead of "abc"; the master said to add a question mark, "/_w*?_/", then the matching result is "abc".
We know'? ' when used alone means: repeat zero or once, and when '? ' appears after the repeat qualifier, and its function is lazy matching, that is, matching as few characters as possible. Lazy qualifier description:
- *?: Repeat any number of times, but repeat as little as possible
- +?: Repeat 1 or more times, but repeat as little as possible
- ??: Repeat 0 or 1 times, but repeat as little as possible
- {n,m}?: Repeat n to m times, but repeat as little as possible
- {n,}?: Repeat n times or more, but repeat as little as possible
Yes, "as few repetitions as possible", this is a crude and straightforward explanation of lazy matching.
So how do you understand “as little repetition as possible”? We can explain it from the ignored priority quantifier of regular expressions.
Ignore priority quantifier
The quantifiers "*?", "+?", "??", "{n,m}?", "{n,}?" are all ignored priority quantifiers. The ignored priority quantifiers are used in ?, It is composed of adding ? after +, *, {}. Ignore priority will first try to ignore when matching. If it fails, it will choose to try after backtracking. For example, if `ab??` matches "abb", it will get "a" instead of "ab". When the engine successfully matches a, because it ignores the priority, the engine first chooses not to match b, and continues to check the expression. If it finds that the expression has ended, the engine will directly report that the match was successful. Specifically, we use the following example to explain step by step the working principle of ignoring priority quantifiers.
Example
Still the above example, use "/_w*?_/" to match the characters between the first pair of "_" in "_abc_123_".
After starting to match the first '_', 'w*?' first decides that it does not need to match any characters because it ignores the priority quantifier. At this time, the expression '/_w*? The second '_' in _/' (the '_' after 'w*?') and the target string '_aThe 'a' in bc_123_' matches, and the match fails. Only then will 'w*?' be used to try the unmatched branch (use w to match a, and the attempt to match a is successful)
Next step, should we try to match or ignore it? Because 'w*?' ignores the priority quantifier and will choose to ignore it, then repeat the previous step. '_' fails to match b, and 'w*?' tries the unmatched branch ab. After repeating the above steps a total of 3 times ( Until the '_' after the expression 'w*?' matches the second '_' of the target string), 'abc' is finally matched.
Process (after starting to match the first '_'):
-
The second '
- _' in expression/_w*?_/' and the target string '_abc_123_' matches, the match fails, 'w*?' tries to match the target string '_abc_123_' 'a' in, the match is successful. The second '
- _' in the expression /_w*?_/' and the target string '_abc_123_' matches, the match fails, 'w*?' tries to match the target string '_abc_123_' 'ab' in, the match is successful. The second '_
- ' in the expression /_w*?_/' and the target string '_abc_123_' matches, the match fails, 'w*?' tries to match the target string '_abc_123_' 'abc' in, the match is successful. The second '_' in the expression /_w*?
- _/' and the target string '_abc_123_' matches, the match is successful, and the match ends. The result is abc. The above are my thoughts after reading the section about ignoring priority quantifiers in "Mastering Regular Expressions". If I am wrong, I will humbly accept your advice. Thank you! Link to this article: http://www.hcoding.com/?p=130 Original article, please indicate: JC&hcoding.com

两种去除方法:1、利用preg_replace()执行正则表达式搜索所有大写字母并将其替换为空字符即可,语法“preg_replace('/[A-Z]/','',$str)”。2、利用preg_filter()执行正则表达式搜索所有大写字母并将其替换为空字符即可,语法“preg_filter('/[A-Z]/','',$str)”。

方法:1、用“str_replace(" ","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\ \;||\xc2\xa0)/","其他字符",$str)”语句。

随着数据量的不断增大,正则表达式匹配成为了程序中常用的操作之一。而在Go语言中,由于其天然的并行ism,以及与底层系统的交互性和高效性,使得Go语言的正则表达式匹配极具优势。那么如何使用Go语言编写高性能的正则表达式匹配呢?一、了解正则表达式在使用正则表达式前,我们首先需要了解正则表达式,了解其基本语法规则以及常用的匹配字符,使我们能够在编写正则表达式时更加

两种方法:1、用preg_replace(),可执行正则表达式的搜索和替换,只需将字符串中匹配的字符替换为空字符即可,语法“preg_replace(正则, "", $str)”。2、用preg_match_all(),可搜索字符串中所有和正则表达式匹配的结果,会将每次的匹配结果放在一个数组$array中,语法“preg_match_all(正则,$str,$array);”。

在javascript中,可以使用replace()函数配合正则表达式“/[u4e00-u9fa5|,]+/ig”来查找字符串中的所有非汉字字符,并将其替换为其他指定值,语法“字符串对象.replace(/[u4e00-u9fa5|,]+/ig,'指定替换值')”。

php中可用preg_match_all()配合正则表达式过滤字符串,只获取中文字符;语法“preg_match_all("/[\x{4e00}-\x{9fff}]+/u","$str",$arr);”,会将匹配字符存入“$arr”数组中。

Java语言正则表达式的使用方法正则表达式是一种强大的文本处理工具,可以用来匹配和验证文本。在Java语言中,也可以使用正则表达式来实现字符串的匹配和处理。本文将介绍Java语言正则表达式的使用方法,涵盖正则表达式的基础知识,常用的正则表达式语法,以及在Java程序中使用正则表达式的方法。一、基础知识正则表达式是什么?正则表达式是一种文本模式,用来描述一组字

在PHP开发中,正则表达式是非常重要的工具,用于匹配、查找和替换文本中的特定字符串。然而,编写高效的正则表达式并不是一件易事,需要开发者具备一定的技巧和经验。下面是一些可以帮助您编写高效正则表达式的技巧:1.尽可能使用非贪婪匹配默认情况下,正则表达式是贪婪的,即它们将尽可能匹配更多的文本。在某些情况下,可能需要使用非贪婪匹配来避免这种情况。非贪婪匹配使用"


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Zend Studio 13.0.1
Powerful PHP integrated development environment

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),
