模式语法

简介
分隔符
元字符
转义序列(反斜线)
Unicode字符属性
锚
句点
字符类(方括号)
可选路径(|)
内部选项设置
子组(子模式)
重复/量词
后向引用
断言
一次性子组
条件子组
注释
递归模式
性能

用户评论:

[#1] mbrodin [2008-11-24 01:18:55]

Hi!

For even better prestanda of the code below, use;

<?php $f = array(); foreach($allTags[1] as $tag){ $f[] = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is"; } if(sizeof($f)) $str = preg_replace($f, ($stripContent ? '' : '${2}'), $str); ?>

This will not use preg_replace on every tag, instead it collect the regex as array, and then executes and should be better.

It also check so there are any regex to replace! If not, it will not start preg_replace! :)

Added the "<?php" so it will highlight the code!

[#2] datacompboy at call2ru dot com [2007-10-29 10:24:09]

For example, you want to cut an some <div> element.
Accurate, from <div> to correspond </div> element.
Here is proof-of-concept code to do this:

<?php $str = "<dqiv1>1+<div2>2+<div3><b><c>3</c></b></div3>2-</div2>1-</div1>"; preg_match("#<div.> ( ". " ( (?>[^<]*) ( < ( ([^/d]|d([^i]|i[^v])) | /([^d]|d([^i]|i[^v])) ) )? )* ".
" | (?R) )* </div.>#xi", $str, $m);
var_dump($m[0]);

?>

it match accurate from <div2> to </div2>. And, if you change <dqiv1> to <div1>, it will match from <div1> to </div1>

[#3] chris at madblanks dot org [2007-07-04 12:22:49]

When enclosing your regular expression in double quotes, back references require two backslashes.

For example, \1 is the ascii character \1. You need to provide \\1 to get the back reference.

[#4] sam marshall [2007-05-24 10:23:49]

For anyone who sees this error:

Warning: preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at ...

As this manual page says, you need PHP 5.1.0 and the /u modifier in order to enable these features, but that isn't the only requirement! It is possible to install later versions of PHP (we have 5.1.4) while linking to an older PCRE install. A quick look at the PCRE changelog suggests that you probably need at least PCRE 5; we're running 4.5, while the latest is 7.1. You can find out your PCRE version by checking phpinfo().

I suspect this ancient PCRE version is included in some officially-supported Red Hat Enterprise package which is probably why we are running it so might also affect other people.

[#5] pstradomski at gmail dot com [2007-03-29 07:55:33]

About strip_selected_tags function from two posts below:

it does not work if somebody uses tags without ending ">" character, like this:

<p <b> bold text </b</p

This is even valid HTML (but not valid XHTML)

[#6] theppg_001 at hotmail dot com [2006-11-20 01:22:05]

Hi there
This was originally made by someone eles but it didn't work correctly and so I remade it and as far as I know it works right.

<?php function strip_selected_tags($str, $tags = "", $stripContent = false) { preg_match_all("/<([^>]+)>/i", $tags, $allTags, PREG_PATTERN_ORDER); $replace = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is"; foreach ($allTags[1] as $tag) { if ($stripContent) { $str = preg_replace($replace,'',$str); } $str = preg_replace($replace,'${2}',$str); } return $str; } ?>

Before I 'fixed' it, when running
strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>")
You would get back
"this is <p align=\"center\">a test</p> and this is bold"
Why? Because it did not take into account that there could be options etc in the HTML Tag.
My one works perfectly when stripping just the tags or the tag and its contents too!

So now when you run
strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>")
You get back
"this is a test and this is bold"
Or when running
strip_selected_tags("this is <p align=\"center\">a test</p> and <b>this is bold</b>","<p><b>",true)
You get back
"this is and "

Hope it helps someone :)

[#7] Daniel Vandersluis [2005-11-23 10:50:35]

Concerning note #6 in "Differences From Perl", the \G token *is* supported as the last match position anchor. This has been confirmed to work at least in preg_replace(), though I'd assume it'd work in preg_match_all(), and other functions that can make more than one match, as well.

[#8] roland dot illig at gmx dot de [2005-11-08 01:02:25]

<quote>
9. Another as yet unresolved discrepancy is that in Perl 5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string "a", whereas in PCRE it does not. However, in both Perl and PCRE /^(a)?a/ matched against "a" leaves $1 unset.
</quote>

The last sentence does not indicate a bug. If the string "a" should match against the regular expression /^(a)?a/, the last "a" in the regex must be matched by any literal "a" in the string. The rest of the string is "", which obviously does not match the first /^(a)/.

[#9] Ned Baldessin [2005-07-16 04:14:25]

[#10] onerob at gmail dot com [2005-04-01 16:51:21]

If, like me, you tend to use the /U pattern modifier, then you will need to remember that using ? or * to to test for optional characters will match zero characters if it means that the rest of the pattern can continue matching, even if the optional characters exist.

For instance, if we have this string:

a___bcde

and apply this pattern:

'/a(_*).*e/U'

The whole pattern is matched but none of the _ characters are placed in the sub-pattern. The way around this (if you still wish to use /U) is to use the ? greediness inverter. eg,

'/a(_*?).*e/U'

[#11] W W W [2005-03-07 07:22:05]

Back references are a great way to achieve exact matching when it would have been impossible any other way. Take these three strings.

1) "www.www.com"
2) 'www.www.com'
3) "www.www.com'

The regex /^("|').+?("|')$/ would match all three strings but what if you needed the 3rd string above to be illegal because the quotes are not the same? You could write four different regexes to check for every possible case OR you could use back references.

/^("|').+?\1$/ will match strings 1 and 2 but not string 3. Try this code for further proof:

$str_test="'www.www.com\"";
$int_count=preg_match("/^(\"|').+?\\1$/", $str_test, $matches, PREG_OFFSET_CAPTURE);

The preg_match function will not match against $str_test because the quotes are mismatched. If you change $str_test to

$str_test = "'www.www.com'";

the preg_match will work.

[#12] info at atjeff dot co dot nz [2005-02-07 16:46:45]

ive never used regex expressions till now and had loads of difficulty trying to convert a [url]link here[/url] into an href for use with posting messages on a forum, heres what i manage to come up with:

$patterns = array(
"/\[link\](.*?)\[\/link\]/",
"/\[url\](.*?)\[\/url\]/",
"/\[img\](.*?)\[\/img\]/",
"/\[b\](.*?)\[\/b\]/",
"/\[u\](.*?)\[\/u\]/",
"/\[i\](.*?)\[\/i\]/"
);
$replacements = array(
"<a href=\"\\1\">\\1</a>",
"<a href=\"\\1\">\\1</a>",
"<img src=\"\\1\">",
"<b>\\1</b>",
"<u>\\1</u>",
"<i>\\1</i>"

);
$newText = preg_replace($patterns,$replacements, $text);

at first it would collect ALL the tags into one link/bold/whatever, until i added the "?" i still dont fully understand it... but it works :)

[#13] J Daugherty [2004-12-09 09:06:57]

In the character class meta-character documentation above, the circumflex (^) is described:

"^ negate the class, but only if the first character"

It should be a little more verbose to fully express the meaning of ^:

^ Negate the character class. If used, this must be the first character of the class (e.g. "[^012]").

[#14] napalm at spiderfish dot net [2004-03-17 08:14:17]

Pay attention that some pcre features such as once-only or recursive patterns are not implemented in php versions prior to 5.00

Napalm

模式语法

Table of Contents

用户评论: