Regular expression experience_PHP tutorial
本文的建议主要着眼于正则表达式的可读性,在开发中养成这些习惯,你将会更加清晰的考虑设计和表达式的结构,这将有助于减少bug和代码的维护,如果你自己就是这个代码的维护者你将倍感轻松。大家可以自己看看,在自己实际使用的过程中注意正则表达式的这些经验。 一、使用空格和注释 对于大部分程序员来说,在一个正则表达式环境里使用空格和缩进排列都不成问题,如果他们没有这么做一定会被同行甚至外行人士看笑话。几乎每个人都知道把代码挤在一行会难于阅读、书写和维护。对于正则表达式又有什么不同呢? m/ 在PHP语言里面,在正则表达式的结尾加上x,这样“"/foo bar/"”变为如下形式: "/ 在Python语言里面,传递模式修饰参数“re.VERBOSE”得到编译函数如下: pattern = r 处理更加复杂的正则表达式时,空格和注释就更能体现出其重要性。假设下面的正则表达式用于匹配美国的电话号码: (?d{3})? ?d{3}[-.]d{4} 这个正则表达式匹配电话号码如“(314)555-4000”的形式,你认为这个正则表达式是否匹配“314-555-4000”或者“555- 4000”呢?答案是两种都不匹配。写上这么一行代码隐蔽了缺点和设计结果本身,电话区号是需要的,但是正则表达式在区号和前缀之间缺少一个分隔符号的说明。 / 改写过的正则表达式现在在电话区号后有一个可选择的分隔符号,这样它应该是匹配“314-555-4000”的,然而电话区号还是必须的。另一个程序员如果需要把电话区号变为可选项则可以迅速看出它现在不是可选的,一个小小的改动就可以解决这个问题。 二、书写测试 一共有三个层次的测试,每一层为你的代码加上一层可靠性。首先,你需要认真想想你需要匹配什么代码以及你是否能够处理错误匹配。其次,你需要利用数据实例来测试正则表达式。最后,你需要正式通过一个测试小组的测试。 #!/usr/bin/perl my @tests = ( "314-555-4000", foreach my $test (@tests) { 在PHP语言里面: <?php "800-555-4400", $regex = "/ foreach ($tests as $test) { 在Python语言里面: import re tests = ["314-555-4000", pattern = r regex = re.compile( pattern, re.VERBOSE ) for test in tests: 运行测试代码将会发现另一个问题:它匹配“1234-123-12345”。 三、为交替操作分组 交替操作符号( )的优先级很低,这意味着它经常交替超过程序员所设计的那样。比如,从文本里面抽取Email地址的正则表达式可能如下: ^CC: To:(.*) 上面的尝试是不正确的,但是这个bug往往不被注意。上面代码的意图是找到“CC:”或者“To:”开始的文本,然后在这一行的后面部分提取Email地址。 (^CC:) (To:(.*)) 如果真正意图是捕获以“CC:”或者“To:”开始的文本行的剩余部分,那么正确的正则表达式如下: ^(CC: To:)(.*) This is a common incomplete matching bug, if you get into the habit of grouping for alternating operations, you will avoid this error. 4. Use loose quantifiers Many programmers avoid using loose quantifiers like "*?", "+?", and "??" even though they make the expression easier to write and understand. 5. Use available delimiters Perl and PHP languages often use a left slash (/) to mark the beginning and end of a regular expression, and Python language uses a set of quotation marks to mark the beginning and end. If you insist on left slashes in Perl and PHP, you'll want to avoid any slashes in expressions; if you use quotes in Python, you'll want to avoid backslashes (). Choosing different delimiters or quotes can allow you to avoid half of the regular expression. This will make expressions easier to read and reduce potential bugs caused by forgetting to avoid symbols.
正则表达式难于书写、难于阅读、难于维护,经常错误匹配意料不到的文本或者错过了有效的文本,这些问题都是由正则表达式的表现和能力引起的。每个元字符(metacharacter)的能力和细微差别组合在一起,使得代码不借助于智力技巧就无法解释。
许多包含一定特性的工具使阅读和编写正则表达式变得容易了,但是它们又很不符合习惯。对于很多程序员来说,书写正则表达式就是一种魔法艺术。他们坚持自己所知道的特征并持有绝对乐观的态度。如果你愿意采用本文所探讨的五个习惯,你将可以让你设计的正则表达式经受的住反复试验。
本文将使用Perl、PHP和Python语言作为代码示例,但是本文的建议几乎适用于任何替换表达式(regex)的执行。
大部分替换表达式工具都具有扩展的空格特性,这允许程序员把他们的正则表达式扩展为多行,并在每一行结尾加上注释。为什么只有少部分程序员利用这个特性呢?Perl 6的正则表达式默认就是扩展空格的模式。不要再让语言替你默认扩展空格了,自己主动利用吧。
记住扩展空格的窍门之一就是让正则表达式引擎忽略扩展空格。这样如果你需要匹配空格,你就不得不明确说明。
在Perl语言里面,在正则表达式的结尾加上x,这样“m/foo bar/”变为如下形式:
foo
bar
/x
foo
bar
/x"
foo
bar
regex = re.compile(pattern, re.VERBOSE)
把这一行代码分成几行并加上注释将把缺点暴露无疑,修改起来显然更容易一些。
在Perl语言里面应该是如下形式:
(? # 可选圆括号
d{3} # 必须的电话区号
)? # 可选圆括号
[-s.]? # 分隔符号可以是破折号、空格或者句点
d{3} # 三位数前缀
[-.] # 另一个分隔符号
d{4} # 四位数电话号码
/x
决定匹配什么其实就是在匹配错误结果和错过正确结果之间寻求一个平衡点。如果你的正则表达式过于严格,它将会错过一些正确匹配;如果它过于宽松,它将会产生一个错误匹配。一旦某个正则表达式发放到实际代码当中,你可能不会两者都注意到。考虑一下上面电话号码的例子,它将会匹配“800-555-4000 = -5355”。错误的匹配其实很难发现,所以提前规划做好测试是很重要的。
还是使用电话号码的例子,如果你在Web表单里面确认一个电话号码,你可能只要满足于任何格式的十位数字。但是,如果你想从大量文本里面分离电话号码,你可能需要很认证的排除不符合要求的错误匹配。
在考虑你想匹配的数据的时候,写下一些案例情况。针对案例情况写下一些代码来测试你的正则表达式。任何复杂的正则表达式都最好写个小程序测试一下,可以采用下面的具体形式。
在Perl语言里面:
"800-555-4400",
"(314)555-4000",
"314.555.4000",
"555-4000",
"aasdklfjklas",
"1234-123-12345"
);
if ( $test =~ m/
(? # 可选圆括号
d{3} # 必须的电话区号
)? # 可选圆括号
[-s.]? # 分隔符号可以是破折号、空格或者句点
d{3} # 三位数前缀
[-s.] # 另一个分隔符号
d{4} # 四位数电话号码
/x ) {
print "Matched on $test
";
}
else {
print "Failed match on $test
";
}
}
$tests = array( "314-555-4000",
"(314)555-4000",
"314.555.4000",
"555-4000",
"aasdklfjklas",
"1234-123-12345" );
(? # 可选圆括号
d{3} # 必须的电话区号
)? # 可选圆括号
[-s.]? # 分隔符号可以是破折号、空格或者句点
d{3} # 三位数前缀
[-s.] # 另一个分隔符号
d{4} # 四位数电话号码
/x";
if (preg_match($regex, $test)) {
echo "Matched on $test
;";
}
else {
echo "Failed match on $test
;";
}
}
?>;
"800-555-4400",
"(314)555-4000",
"314.555.4000",
"555-4000",
"aasdklfjklas",
"1234-123-12345"
]
(? # 可选圆括号
d{3} # 必须的电话区号
)? # 可选圆括号
[-s.]? # 分隔符号可以是破折号、空格或者句点
d{3} # 三位数前缀
[-s.] # 另一个分隔符号
d{4} # 四位数电话号码
if regex.match(test):
print "Matched on", test, "
"
else:
print "Failed match on", test, "
"
理论上,你需要整合整个程序所有的测试到一个测试小组里面。即使你现在还没有测试小组,你的正则表达式测试也会是一个小组的良好基础,现在正是开始创建的好机会。即使现在还不是创建的合适时间,你也应该在每次修改以后运行测试一下正则表达式。这里花费一小段时间将会减少你很多麻烦事。
不幸的是,如果某一行中间出现“To:”,那么这个正则表达式将捕获不到任何以“CC:”开始的一行,而是抽取几个随机的文本。坦白的说,正则表达式匹配 “CC:”开始的一行,但是什么都捕获不到;或者匹配任何包含“To:”的一行,但是把这行的剩余文本都捕获了。通常情况下,这个正则表达式会捕获大量 Email地址,所有没有人会注意这个bug。
如果要符合实际意图,那么你应该加入括号说明清楚,正则表达式如下:
Relaxed quantifiers can match as little text as possible, which contributes to the success of exact matches. If you wrote "foo(.*?)bar", the quantifier would stop matching the first time it encounters "bar", not the last time. This is important if you wish to capture "###" from "foo###bar+++bar". A strict quantifier would capture "###bar++ +". ;), this will cause a lot of trouble. If you use relaxed quantifiers, you can generate new regular expressions by spending very little time assembling character types.
Relaxed quantifiers are of great value when you know the structure of the context in which you want to capture the text.
Perl and PHP languages allow the use of any non-numeric and space characters as delimiters. If you switch to a new delimiter, you can avoid missing the left slash when matching URLs or HTML tags (such as "http://" or "
;").
For example, "/http://(S)*/" can be written as "#http://(S)*#".
Common delimiters are "#", "!" and " ". If you use square brackets, angle brackets, or curly braces, just keep them matched.

What’s still popular is the ease of use, flexibility and a strong ecosystem. 1) Ease of use and simple syntax make it the first choice for beginners. 2) Closely integrated with web development, excellent interaction with HTTP requests and database. 3) The huge ecosystem provides a wealth of tools and libraries. 4) Active community and open source nature adapts them to new needs and technology trends.

PHP and Python are both high-level programming languages that are widely used in web development, data processing and automation tasks. 1.PHP is often used to build dynamic websites and content management systems, while Python is often used to build web frameworks and data science. 2.PHP uses echo to output content, Python uses print. 3. Both support object-oriented programming, but the syntax and keywords are different. 4. PHP supports weak type conversion, while Python is more stringent. 5. PHP performance optimization includes using OPcache and asynchronous programming, while Python uses cProfile and asynchronous programming.

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

PHP remains important in the modernization process because it supports a large number of websites and applications and adapts to development needs through frameworks. 1.PHP7 improves performance and introduces new features. 2. Modern frameworks such as Laravel, Symfony and CodeIgniter simplify development and improve code quality. 3. Performance optimization and best practices further improve application efficiency.

PHPhassignificantlyimpactedwebdevelopmentandextendsbeyondit.1)ItpowersmajorplatformslikeWordPressandexcelsindatabaseinteractions.2)PHP'sadaptabilityallowsittoscaleforlargeapplicationsusingframeworkslikeLaravel.3)Beyondweb,PHPisusedincommand-linescrip

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

SublimeText3 Linux new version
SublimeText3 Linux latest version

SublimeText3 Mac version
God-level code editing software (SublimeText3)

SublimeText3 English version
Recommended: Win version, supports code prompts!

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.