文字

preg_match

(PHP 4, PHP 5)

preg_match — 执行一个正则表达式匹配

说明

int preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] )

搜索subject与pattern给定的正则表达式的一个匹配.

参数

pattern

要搜索的模式，字符串类型。

subject

输入字符串。

matches

如果提供了参数matches，它将被填充为搜索结果。 $matches[0] 将包含完整模式匹配到的文本， $matches[1] 将包含第一个捕获子组匹配到的文本，以此类推。

flags

flags可以被设置为以下标记值：

PREG_OFFSET_CAPTURE: 如果传递了这个标记，对于每一个出现的匹配返回时会附加字符串偏移量(相对于目标字符串的)。注意：这会改变填充到matches参数的数组，使其每个元素成为一个由第0个元素是匹配到的字符串，第1个元素是该匹配字符串在目标字符串subject中的偏移量。

offset

通常，搜索从目标字符串的开始位置开始。可选参数 offset 用于指定从目标字符串的某个未知开始搜索(单位是字节)。

Note:

使用offset参数不同于向 preg_match() 传递按照位置通过substr($subject, $offset)截取目标字符串结果，因为pattern可以包含断言比如^， $ 或者(?<=x)。比较：
<?php $subject = "abcdef" ; $pattern = '/^def/' ; preg_match ( $pattern , $subject , $matches , PREG_OFFSET_CAPTURE , 3 ); print_r ( $matches ); ?>

以上例程会输出：
Array
(
)
当这个示例使用截取后传递时

<?php $subject = "abcdef" ; $pattern = '/^def/' ; preg_match ( $pattern , substr ( $subject , 3 ), $matches , PREG_OFFSET_CAPTURE ); print_r ( $matches ); ?>

将会产生匹配
Array
(
    [0] => Array
        (
            [0] => def
            [1] => 0
        ))

返回值

preg_match() 返回 pattern 的匹配次数。它的值将是0次（不匹配）或1次，因为 preg_match() 在第一次匹配后将会停止搜索。 preg_match_all() 不同于此，它会一直搜索subject 直到到达结尾。如果发生错误 preg_match() 返回 FALSE 。

更新日志

版本	说明
5.3.6	如果 `offset` 比 `subject` 的长度还要大则返回 `FALSE` 。
5.2.2	命名子组可以接受(?<name>)， (?'name') 以及(?P<name>)语法。之前版本仅接受(?P<name>)语法。
4.3.3	增加了参数`offset`.
4.3.0	增加了标记 `PREG_OFFSET_CAPTURE` 。
4.3.0	增加了参数`flags`。

范例

Example #1 查找文本字符串"php"

  <?php
 //模式分隔符后的"i"标记这是一个大小写不敏感的搜索
 if ( preg_match ( "/php/i" ,  "PHP is the web scripting language of choice." )) {
    echo  "A match was found." ;
} else {
    echo  "A match was not found." ;
}
 ?>

Example #2 查找单词"word"

  <?php
 
 if ( preg_match ( "/\bweb\b/i" ,  "PHP is the web scripting language of choice." )) {
    echo  "A match was found." ;
} else {
    echo  "A match was not found." ;
}

if ( preg_match ( "/\bweb\b/i" ,  "PHP is the website scripting language of choice." )) {
    echo  "A match was found." ;
} else {
    echo  "A match was not found." ;
}
 ?>

Example #3 获取URL中的域名

  <?php
 //从URL中获取主机名称
 preg_match ( '@^(?:http://)?([^/]+)@i' ,
     "http://www.php.net/index.html" ,  $matches );
 $host  =  $matches [ 1 ];

 //获取主机名称的后面两部分
 preg_match ( '/[^.]+\.[^.]+$/' ,  $host ,  $matches );
echo  "domain name is:  { $matches [ 0 ]} \n" ;
 ?>

以上例程会输出：

domain name is: php.net

Example #4 使用命名子组

  <?php

$str  =  'foobar: 2008' ;

 preg_match ( '/(?P<name>\w+): (?P<digit>\d+)/' ,  $str ,  $matches );

 
// preg_match('/(?<name>\w+): (?<digit>\d+)/', $str, $matches);

 print_r ( $matches );

 ?>

以上例程会输出：

Array
(
    [0] => foobar: 2008
    [name] => foobar
    [1] => foobar
    [digit] => 2008
    [2] => 2008
)

注释

Tip

如果你仅仅想要检查一个字符串是否包含另外一个字符串，不要使用 preg_match() 。使用 strpos() 或 strstr() 替代完成工作会更快。

参见

PCRE 模式
preg_match_all() - 执行一个全局正则表达式匹配
preg_replace() - 执行一个正则表达式的搜索和替换
preg_split() - 通过一个正则表达式分隔字符串
preg_last_error() - 返回最后一个PCRE正则执行产生的错误代码

用户评论:

[#1] Supriya Karmakar Kolkata [2014-10-14 17:34:54]

Always escape double quotes to avoid errors, even if you don't need to.

bad practice:
$foo = preg_match('/<h2 class="bengali">.*?<\/h2>/', $bigTextChunk, $myArray);

good practice:
$foo = preg_match("/<h2 class=\"bengali\">.*?<\/h2>/", bigTextChunk, $myArray);

Bad practice can cause mysterious errors as it happened in my case.

[#2] dkr at dotnull dot de [2014-08-26 15:35:11]

I noted that PCRE_ANCHORED (the pattern modifier A) does work fine if using an offset. If you use the escape sequence \A or even the dash "^" in the regex, it does not work (even if in multiline mode)...

<?php $text = 'foo bar'; print (int) preg_match('/^bar/',$text,$a,null,4); // prints 0 print (int) preg_match('/\Abar/',$text,$a,null,4); // prints 0 print (int) preg_match('/bar/A',$text,$a,null,4); // prints 1 ?>

Hope this helps someone out there! :-)

Version: PHP 5.5.12

[#3] Gilles A [2014-08-22 13:27:21]

Using named subpattern :

Since PCRE 7.0 ( PHP  >= 5.2.2) , named groups can be defined using
(?<name>) or (?'name') instead of (?P<name>)

<?php $str = 'foobar: 2008'; preg_match('/(?P<name>\w+): (?P<digit>\d+)/', $str, $matches); print_r($matches); //Or preg_match('/(?\'name\'\w+): (?\'digit\'\d+)/', $str, $matches); print_r($matches); //Or preg_match('/(?<name>\w+): (?<digit>\d+)/', $str, $matches); print_r($matches); ?>

//Result

Array
(
    [0] =>foobar: 2008
    [name] => foobar
    [1] => foobar
    [digit] => 2008
    [2] => 2008
)

Array
(
    [0] => foobar: 2008
    [name] => foobar
    [1] => foobar
    [digit] => 2008
    [2] => 2008
)
Array
(
    [0] => foobar: 2008
    [name] => foobar
    [1] => foobar
    [digit] => 2008
    [2] => 2008
)

[#4] Iven Marquardt [2013-12-13 09:57:28]

if you want to match all printable ascii (0..127) expect some specific chars, try this:

<?php $excluded = '\$a'; echo preg_replace('~[^' . $excluded . '[:^print:]]~', '', 'abc123ABC!?$%/?'); ?>

result: a$?

[#5] Jeff Weiss [2013-07-25 14:18:27]

Example of validating an email address and breaking it into 3 parts ( local, domain name, domain suffix )

A case insensitive  email is valid if:

1) local matches letters a..z or characters . - _ +
2) domain name matches letters a..z or characters - _
3) domain suffix matches letters a..z and is between 2 and 4 characters in length

<?php preg_match('/(^[a-zA-Z_.+-]+)@([a-zA-Z_-]+).([a-zA-Z]{2,4}$)/i', "jeff@nowhere.com", $matches); var_export($matches); ?>

outputs:

Array
(
    [0] => jeff@nowhere.com
    [1] => jeff
    [2] => nowhere
    [3] => com
)

[#6] asdfasdasad34535 at iflow dot at [2013-07-21 21:07:45]

Attention! PREG_OFFSET_CAPTURE not UTF-8 aware when using u modifier
and it's not a but, it's a feature:
https://bugs.php.net/bug.php?id=37391

Possible workaround: Use mb_strpos to get the correct offset, instead of the flag.

UTF-8 support would be nice.

[#7] hessemanj2100 at gmail dot com [2013-07-01 17:32:24]

Just a note about my last post. The regex expression for the function I posted contains a question mark at the end. Technically this doesn't need to be there but it will work with or without it. Just remove it if you don't want it. Enjoy!

[#8] hessemanj2100 at gmail dot com [2013-07-01 17:05:37]

The most accurate IPv4 function. It will not allow leading zeros and supports the full address range of 0.0.0.0 - 255.255.255.255

<?php function is_ipv4($string) { // The regular expression checks for any number between 0 and 255 beginning with a dot (repeated 3 times) // followed by another number between 0 and 255 at the end. The equivalent to an IPv4 address. return (bool) preg_match('/^(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])'. '\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|[0-9])$/', $string); } ?>

[#9] andre at koethur dot de [2013-06-17 20:49:29]

Be aware of bug https://bugs.php.net/bug.php?id=50887 when using sub patterns: Un-matched optional sub patterns at the end won't show up in $matches.

Here is a workaround: Assign a name to all subpatterns you are interested in, and merge $match afterwards with an constant array containing some reasonable default values:

<?php if (preg_match('/^(?P<lang>[^;*][^;]*){1}(?:;q=(?P<qval>[0-9.]+))?$/u', 'de', $match)) {   $match = array_merge(array('lang' => '', 'qval' => ''), $match);   print_r($match); } ?>

This outputs:
Array
(
    [lang] => de
    [qval] =>
    [0] => de
    [1] => de
)

Instead of:
Array
(
    [0] => de
    [lang] => de
    [1] => de
)

[#10] yofilter-php at yahoo dot co dot uk [2013-03-05 13:48:11]

There does not seem to be any mention of the PHP version of switches that can be used with regular expressions.

preg_match_all('/regular expr/sim',$text).

The s i m being the location for and available switches (I know about)
The i is to ignore letter cases (this is commonly known - I think)
The s tells the code NOT TO stop searching when it encounters \n (line break) - this is important with multi-line entries for example text from an editor that needs search.
The m tells the code it is a multi-line entry, but importantly allows the use of ^ and $ to work when showing start and end.

I am hoping this will save someone from the 4 hours of torture that I endured, trying to workout this issue.

[#11] Yousef Ismaeil Cliprz [2013-02-04 13:31:23]

Some times a Hacker use a php file or shell as a image to hack your website. so if you try to use move_uploaded_file() function as in example to allow for users to upload files, you must check if this file contains a bad codes or not so we use this function. preg match

in this function we use

unlink() - http://php.net/unlink

after you upload file check a file with below function.

<?php function is_clean_file ($file) { if (file_exists($file)) { $contents = file_get_contents($file); } else { exit($file." Not exists."); } if (preg_match('/(base64_|eval|system|shell_|exec|php_)/i',$contents)) { return true; } else if (preg_match("#&\#x([0-9a-f]+);#i", $contents)) { return true; } elseif (preg_match('#&\#([0-9]+);#i', $contents)) { return true; } elseif (preg_match("#([a-z]*)=([\`\'\"]*)script:#iU", $contents)) { return true; } elseif (preg_match("#([a-z]*)=([\`\'\"]*)javascript:#iU", $contents)) { return true; } elseif (preg_match("#([a-z]*)=([\'\"]*)vbscript:#iU", $contents)) { return true; } elseif (preg_match("#(<[^>]+)style=([\`\'\"]*).*expression\([^>]*>#iU", $contents)) { return true; } elseif (preg_match("#(<[^>]+)style=([\`\'\"]*).*behaviour\([^>]*>#iU", $contents)) { return true; } elseif (preg_match("#<s'; $valid=preg_match($pattern, $subject, $match); ?>

[#24] itworkarounds at gmail dot com [2011-08-09 09:08:09]

You can use the following code to detect non-latin (Cyrilic, Arabic, Greek...) characters:

<?php preg_match("/^[a-zA-Z\p{Cyrillic}0-9\s\-]+$/u", "ABC abc 1234 ?????? ????"); ?>

[#25] mohammad40g at gmail dot com [2011-08-01 21:23:57]

This sample is for checking persian character:

<?php preg_match("/[\x{0600}-\x{06FF}\x]{1,32}/u", '????'); ?>

[#26] sun at drupal dot org [2011-06-23 17:56:40]

Basic test for invalid UTF-8 that can hi-jack IE:

<?php $valid = (preg_match('/^./us', $text) == 1); ?>
See http://api.drupal.org/api/drupal/includes--bootstrap.inc/function/drupal_validate_utf8/7 for details.

---

Test for valid UTF-8 and XML/XHTML character range compatibility:

<?php $invalid = preg_match('@[^\x9\xA\xD\x20-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]@u', $text) ?>
Ref: http://www.w3.org/TR/2000/REC-xml-20001006#charsets

[#27] juanmadss at gmail dot com [2011-05-25 04:00:45]

Testing the speed of preg_match against stripos doing insensitive case search in strings:

<?php $string = "Hey, how are you? I'm a string."; // PCRE $start = microtime(true); for ($i = 1; $i < 10000000; $i++) { $bool = preg_match('/you/i', $string); } $end = microtime(true); $pcre_lasted = $end - $start; // 8.3078360557556 // Stripos, we believe in you $start = microtime(true); for ($i = 1; $i < 10000000; $i++) { $bool = stripos($string, 'you') !== false; } $end = microtime(true); $stripos_lasted = $end - $start; // 6.0306038856506 echo "Preg_match lasted: {$pcre_lasted}<br />Stripos lasted: {$stripos_lasted}"; ?>

So unless you really need to test a string against a regular expression, always use strpos / stripos and other string functions to find characters and strings within other strings.

[#28] mulllhausen [2011-05-16 01:57:51]

i do a fair bit of html scraping in conjunction with curl. i always need to know if i have reached the right page or if the curl request failed. the main problem i have encountered is html tags having unexpected spaces or other characters (especially the   character sequence) between them. for example when requesting a page with a certain manner set of post or get variables the response might be

<a href='blah'><span>data data data</span></a>

but requesting the same page with different post/get variables might give the following result:

<a href='blah'>
 <span>data data data</span>
</a>

to match both of these tag sequences with the same pattern i use the [\S\s]*? wildcard which basically means 'match anything at all...but not if you can help it'

so the pattern for the above sequence would be:

<?php $page1 = "........<a href='blah'><span>data data data</span></a>........."; $page2 = "........<a href='blah'>  <span>data data data</span> </a> ........"; $w = "[\s\S]*?"; //ungreedy wildcard $pattern = "/\<a href='blah'\>$w\<span\>data data data\<\/span\>$w\<\/a\>/"; if(preg_match($pattern, $page1, $matches)) echo "got to page 1. match: [".print_r($matches, true)."]\n"; else echo "did not get to page 1\n"; if(preg_match($pattern, $page2, $matches)) echo "got to page 2. match: [".print_r($matches, true)."]\n"; else echo "did not get to page 2\n"; ?>

[#29] MrBull [2011-03-20 07:32:57]

Sometimes its useful to negate a string. The first method which comes to mind to do this is: [^(string)] but this of course won't work. There is a solution, but it is not very well known. This is the simple piece of code on how a negation of a string is done:

(?:(?!string).)

?: makes a subpattern (see http://www.php.net/manual/en/regexp.reference.subpatterns.php) and ?! is a negative look ahead. You put the negative look ahead in front of the dot because you want the regex engine to first check if there is an occurrence of the string you are negating. Only if it is not there, you want to match an arbitrary character.

Hope this helps some ppl.

[#30] arash dot hemmat at gmail dot com [2011-02-02 19:15:47]

For those who search for a unicode regular expression example using preg_match here it is:

Check for Persian digits
preg_match( "/[^\x{06F0}-\x{06F9}\x]+/u" , '??????????' );

[#31] Frank [2011-01-26 12:12:44]

If someone is from a country that accepts decimal numbers in format 9.00 and 9,00 (point or comma), number validation would be like that:
<?php $number_check = "9,99"; if (preg_match( '/^[\-+]?[0-9]*\.*\,?[0-9]+$/', $number_check)) { return TRUE; } ?>

However, if the number will be written in the database, most probably this comma needs to be replaced with a dot.
This can be done with use of str_replace, i.e :
<?php $number_database = str_replace("," , "." , $number_check); ?>

[#32] sainnr at gmail dot com [2010-12-30 06:12:31]

This sample regexp may be useful if you are working with DB field types.

(?P<type>\w+)($|$(?P<length>(\d+|(.*)))$)

For example, if you are have a such type as "varchar(255)" or "text", the next fragment

<?php $type = 'varchar(255)'; // type of field preg_match('/(?P<type>\w+)($|$(?P<length>(\d+|(.*)))$)/', $type, $field); print_r($field); ?>

will output something like this:
Array ( [0] => varchar(255) [type] => varchar [1] => varchar [2] => (255) [length] => 255 [3] => 255 [4] => 255 )

[#33] ian_channing at hotmail dot com [2010-12-27 01:55:50]

When trying to check a file path that could be windows or unix it took me quite a few tries to get the escape characters right.

The Unix directory separator must be escaped once and the windows directory separator must be escaped twice.

This will match path/to/file and path\to\file.exe

preg_match('/^[a-z0-9_.\/\\\]*$/i', $file_string);

[#34] SoN9ne at gmail dot com [2010-06-08 10:10:58]

I have been working on a email system that will automatically generate a text email from a given HTML email by using strip_tags().
The only issue I ran into, for my needs, were that the anchors would not keep their links.
I search for a little while and could not find anything to strip the links from the tags so I generated my own little snippet.
I am posting it here in hopes that others may find it useful and for later reference.

A note to keep in mind:
I was primarily concerned with valid HTML so if attributes do no use ' or " to contain the values then this will need to be tweaked.
If you can edit this to work better, please let me know.
<?php function replaceAnchorsWithText($data) { $regex = '/(<a\s*'; // Start of anchor tag $regex .= '(.*?)\s*'; // Any attributes or spaces that may or may not exist $regex .= 'href=[\'"]+?\s*(?P<link>\S+)\s*[\'"]+?'; // Grab the link $regex .= '\s*(.*?)\s*>\s*'; // Any attributes or spaces that may or may not exist before closing tag $regex .= '(?P<name>\S+)'; // Grab the name $regex .= '\s*<\/a>)/i'; // Any number of spaces between the closing anchor tag (case insensitive) if (is_array($data)) { // This is what will replace the link (modify to you liking) $data = "{$data['name']}({$data['link']})"; } return preg_replace_callback($regex, 'replaceAnchorsWithText', $data); } $input = 'Test 1: <a href="http: //php.net1">PHP.NET1</a>.<br />'; $input .= 'Test 2: <A name="test" HREF=\'HTTP: //PHP.NET2\' target="_blank">PHP.NET2</A>.<BR />'; $input .= 'Test 3: <a hRef=http: //php.net3>php.net3</a><br />'; $input .= 'This last line had nothing to do with any of this'; echo replaceAnchorsWithText($input).'<hr/>'; ?>
Will output:
Test 1: PHP.NET1(http: //php.net1).
Test 2: PHP.NET2(HTTP: //PHP.NET2).
Test 3: php.net3 (is still an anchor)
This last line had nothing to do with any of this

Posting to this site is painful...
Had to break up the regex and had to break the test links since it was being flagged as spam...

[#35] teracci2002 [2010-04-09 09:00:41]

When you use preg_match() for security purpose or huge data processing,
mayby you should make consideration for backtrack_limit and recursion_limit.
http://www.php.net/manual/en/pcre.configuration.php

These limits may bring wrong matching result.
You can verify whether you hit these limits by checking preg_last_error().
http://www.php.net/manual/en/function.preg-last-error.php

[#36] Kae Cyphet [2010-03-17 18:29:09]

for those coming over from ereg, preg_match can be quite intimidating. to get started here is a migration tip.

<?php if(ereg('[^0-9A-Za-z]',$test_string)) // will be true if characters arnt 0-9, A-Z or a-z. if(preg_match('/[^0-9A-Za-z]/',$test_string)) // this is the preg_match version. the /'s are now required. ?>

[#37] plasma [2010-02-21 16:53:34]

To extract scheme, host, path, ect. simply use

<?php   $url  = 'http://name:pass@';   $url .= 'example.com:10000';   $url .= '/path/to/file.php?a=1&b=2#anchor';   $url_data = parse_url ( $url );   print_r ( $url_data ); ?>
___
prints out something like:

Array
(
    [scheme] => http
    [host] => wild.subdomain.orgy.domain.co.uk
    [port] => 10000
    [user] => name
    [pass] => pass
    [path] => /path/to/file.php
    [query] => a=1&b=2
    [fragment] => anchor
)

In my tests parse_url is up to 15x faster than preg_match(_all)!

[#38] Dr@ke [2010-02-18 07:58:23]

Hello,
There is a bug with somes new PCRE versions (like:7.9 2009-04-1),
In patterns:
\w+ !== [a-zA-Z0-9]+

But it's ok, if i replace \w+ by [a-z0-9]+ or [a-zA-Z0-9]+

[#39] saberdream at live dot fr [2010-02-10 15:53:19]

I made a function to circumvent the problem of length of a string... This verifies that the link is an image.

<?php function verifiesimage($lien, $limite) { if( preg_match('#^http:\/\/(.*)\.(gif|png|jpg)$#i', $lien) && strlen($lien) < $limite ) { $msg = TRUE; // link ok } else { $msg = FALSE; // the link isn't image } return $msg; // return TRUE or FALSE } ?>

Example :

<?php if(verifierimage($votrelien, 50) == TRUE) { // we display the content... } ?>

[#40] Anonymous [2010-02-06 08:00:01]

The regular expression for breaking-down a URI reference into its components:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9

Source: ietf.org/rfc/rfc2396.txt

[#41] cebelab at gmail dot com [2010-01-23 22:43:07]

I noticed that in order to deal with UTF-8 texts, without having to recompile php with the PCRE UTF-8 flag enabled, you can just add the following sequence at the start of your pattern: (*UTF8)

for instance : '#(*UTF8)[[:alnum:]]#' will return TRUE for '??' where '#[[:alnum:]]#' will return FALSE

found this very very useful tip after hours of research over the web directly in pcre website right here : http://www.pcre.org/pcre.txt
there are many further informations about UTF-8 support in the lib

hop that will help!

--
cedric

[#42] Stefan [2009-11-17 14:47:45]

I spent a while replacing all my ereg() calls to preg_match(), since ereg() is now deprecated and will not be supported as of v 6.0.

Just a warning regarding the conversion, the two functions behave very similarly, but not exactly alike. Obviously, you will need to delimit your pattern with '/' or '|' characters.

The difference that stumped me was that preg_replace overwrites the $matches array regardless if a match was found. If no match was found, $matches is simply empty.

ereg(), however, would leave $matches alone if a match was not found. In my code, I had repeated calls to ereg, and was populating $matches with each match. I was only interested in the last match. However, with preg_match, if the very last call to the function did not result in a match, the $matches array would be overwritten with a blank value.

Here is an example code snippet to illustrate:

<?php $test = array('yes','no','yes','no','yes','no'); foreach ($test as $key=>$value) { ereg("yes",$value,$matches1); preg_match("|yes|",$value,$matches2); } print "ereg result: $matches1[0]<br>"; print "preg_match result: $matches2[0]<br>"; ?>

The output is:
ereg result: yes
preg_match result:

($matches2[0] in this case is empty)

I believe the preg_match behavior is cleaner. I just thought I would report this to hopefully save others some time.

[#43] ruakuu at NOSPAM dot com [2009-11-03 21:32:29]

Was working on a site that needed japanese and alphabetic letters and needed to
validate input using preg_match, I tried using \p{script} but didn't work:

<?php $pattern ='/^([-a-zA-Z0-9_\p{Katakana}\p{Hiragana}\p{Han}]*)$/u'; // Didn't work ?>

So I tried with ranges and it worked:

<?php $pattern ='/^[-a-zA-Z0-9_\x{30A0}-\x{30FF}' .'\x{3040}-\x{309F}\x{4E00}-\x{9FBF}\s]*$/u'; $match_string = '???? ??????E??? ???`???`??'; if (preg_match($pattern, $match_string)) { echo "Found - pattern $pattern"; } else { echo "Not found - pattern $pattern"; } ?>

U+4E00?CU+9FBF Kanji
U+3040?CU+309F Hiragana
U+30A0?CU+30FF Katakana

Hope its useful, it took me several hours to figure it out.

[#44] Anonymous [2009-10-12 02:24:10]

If your regular expression does not match with long input text when you think it should, you might have hit the PCRE backtrack default limit of 100000. See http://php.net/pcre.backtrack-limit.

[#45] splattermania at freenet dot de [2009-10-01 05:01:00]

As I wasted lots of time finding a REAL regex for URLs and resulted in building it on my own, I now have found one, that seems to work for all kinds of urls:

<?php $regex = "((https?|ftp)\:\/\/)?"; // SCHEME $regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass $regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP $regex .= "(\:[0-9]{2,5})?"; // Port $regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path $regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query $regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor ?>

Then, the correct way to check against the regex ist as follows:

<?php if(preg_match("/^$regex$/", $url)) { return true; } ?>

[#46] luc _ santeramo at t yahoo dot com [2009-09-03 07:46:49]

If you want to validate an email in one line, use filter_var() function !
http://fr.php.net/manual/en/function.filter-var.php

easy use, as described in the document example :
var_dump(filter_var('bob@example.com', FILTER_VALIDATE_EMAIL));

[#47] marcosc at tekar dot net [2009-08-27 09:31:23]

When using accented characters and "?" (???????????), preg_match does not work. It is a charset problem, use utf8_decode/decode to fix.

[#48] ian_channing at hotmail dot com [2009-08-20 06:13:46]

This is a function that uses regular expressions to match against the various VAT formats required across the EU.

<?php function checkVatNumber( $country, $vat_number ) { switch($country) { case 'Austria': $regex = '/^(AT){0,1}U[0-9]{8}$/i'; break; case 'Belgium': $regex = '/^(BE){0,1}[0]{0,1}[0-9]{9}$/i'; break; case 'Bulgaria': $regex = '/^(BG){0,1}[0-9]{9,10}$/i'; break; case 'Cyprus': $regex = '/^(CY){0,1}[0-9]{8}[A-Z]$/i'; break; case 'Czech Republic': $regex = '/^(CZ){0,1}[0-9]{8,10}$/i'; break; case 'Denmark': $regex = '/^(DK){0,1}([0-9]{2}[\ ]{0,1}){3}[0-9]{2}$/i'; break; case 'Estonia': case 'Germany': case 'Greece': case 'Portugal': $regex = '/^(EE|EL|DE|PT){0,1}[0-9]{9}$/i'; break; case 'France': $regex = '/^(FR){0,1}[0-9A-Z]{2}[\ ]{0,1}[0-9]{9}$/i'; break; case 'Finland': case 'Hungary': case 'Luxembourg': case 'Malta': case 'Slovenia': $regex = '/^(FI|HU|LU|MT|SI){0,1}[0-9]{8}$/i'; break; case 'Ireland': $regex = '/^(IE){0,1}[0-9][0-9A-Z\+\*][0-9]{5}[A-Z]$/i'; break; case 'Italy': case 'Latvia': $regex = '/^(IT|LV){0,1}[0-9]{11}$/i'; break; case 'Lithuania': $regex = '/^(LT){0,1}([0-9]{9}|[0-9]{12})$/i'; break; case 'Netherlands': $regex = '/^(NL){0,1}[0-9]{9}B[0-9]{2}$/i'; break; case 'Poland': case 'Slovakia': $regex = '/^(PL|SK){0,1}[0-9]{10}$/i'; break; case 'Romania': $regex = '/^(RO){0,1}[0-9]{2,10}$/i'; break; case 'Sweden': $regex = '/^(SE){0,1}[0-9]{12}$/i'; break; case 'Spain': $regex = '/^(ES){0,1}([0-9A-Z][0-9]{7}[A-Z])|([A-Z][0-9]{7}[0-9A-Z])$/i'; break; case 'United Kingdom': $regex = '/^(GB){0,1}([1-9][0-9]{2}[\ ]{0,1}[0-9]{4}[\ ]{0,1}[0-9]{2})|([1-9][0-9]{2}[\ ]{0,1}[0-9]{4}[\ ]{0,1}[0-9]{2}[\ ]{0,1}[0-9]{3})|((GD|HA)[0-9]{3})$/i'; break; default: return -1; break; } return preg_match($regex, $vat_number); } ?>

[#49] Rob [2009-08-19 12:03:37]

The following function works well for validating ip addresses

<?php function valid_ip($ip) { return preg_match("/^([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])" . "(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}$/", $ip); } ?>

[#50] matt [2009-05-08 13:07:56]

To support large Unicode ranges (ie: [\x{E000}-\x{FFFD}] or \x{10FFFFF}) you must use the modifier '/u' at the end of your expression.

[#51] daniel dot chcouri at gmail dot com [2009-05-03 06:09:03]

Html tags delete using regular expression

<?php function removeHtmlTagsWithExceptions($html, $exceptions = null){ if(is_array($exceptions) && !empty($exceptions)) { foreach($exceptions as $exception) { $openTagPattern = '/<(' . $exception . ')(\s.*?)?>/msi'; $closeTagPattern = '/<\/(' . $exception . ')>/msi'; $html = preg_replace( array($openTagPattern, $closeTagPattern), array('||l|\1\2|r||', '||l|/\1|r||'), $html ); } } $html = preg_replace('/<.*?>/msi', '', $html); if(is_array($exceptions)) { $html = str_replace('||l|', '<', $html); $html = str_replace('|r||', '>', $html); } return $html; } // example: print removeHtmlTagsWithExceptions(<<<EOF <h1>Whatsup?!</h1> Enjoy <span style="text-color:blue;">that</span> script<br /> <br /> EOF , array('br')); ?>

[#52] corey [works at] effim [delete] .com [2009-04-24 20:52:27]

I see a lot of people trying to put together phone regex's and struggling (hey, no worries...they're complicated). Here's one that we use that's pretty nifty. It's not perfect, but it should work for most non-idealists.

*** Note: Only matches U.S. phone numbers. ***

<?php // all on one line... $regex = '/^(?:1(?:[. -])?)?(?:$(?=\d{3}$))?([2-9]\d{2})(?:(?<=$\d{3})$)? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/'; // or broken up $regex = '/^(?:1(?:[. -])?)?(?:$(?=\d{3}$))?([2-9]\d{2})' .'(?:(?<=$\d{3})$)? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})' .'[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/'; ?>

If you're wondering why all the non-capturing subpatterns (which look like this "(?:", it's so that we can do this:

<?php $formatted = preg_replace($regex, '($1) $2-$3 ext. $4', $phoneNumber); // or, provided you use the $matches argument in preg_match $formatted = "($matches[1]) $matches[2]-$matches[3]"; if ($matches[4]) $formatted .= " $matches[4]"; ?>

*** Results: ***
520-555-5542 :: MATCH
520.555.5542 :: MATCH
5205555542 :: MATCH
520 555 5542 :: MATCH
520) 555-5542 :: FAIL
(520 555-5542 :: FAIL
(520)555-5542 :: MATCH
(520) 555-5542 :: MATCH
(520) 555 5542 :: MATCH
520-555.5542 :: MATCH
520 555-0555 :: MATCH
(520)5555542 :: MATCH
520.555-4523 :: MATCH
19991114444 :: FAIL
19995554444 :: MATCH
514 555 1231 :: MATCH
1 555 555 5555 :: MATCH
1.555.555.5555 :: MATCH
1-555-555-5555 :: MATCH
520-555-5542 ext.123 :: MATCH
520.555.5542 EXT 123 :: MATCH
5205555542 Ext. 7712 :: MATCH
520 555 5542 ext 5 :: MATCH
520) 555-5542 :: FAIL
(520 555-5542 :: FAIL
(520)555-5542 ext .4 :: FAIL
(512) 555-1234 ext. 123 :: MATCH
1(555)555-5555 :: MATCH

[#53] daevid at daevid dot com [2009-03-06 15:18:44]

I just learned about named groups from a Python friend today and was curious if PHP supported them, guess what -- it does!!!

http://www.regular-expressions.info/named.html

<?php    preg_match("/(?P<foo>abc)(.*)(?P<bar>xyz)/",                        'abcdefghijklmnopqrstuvwxyz',                        $matches);    print_r($matches); ?>

will produce:

Array
(
    [0] => abcdefghijklmnopqrstuvwxyz
    [foo] => abc
    [1] => abc
    [2] => defghijklmnopqrstuvw
    [bar] => xyz
    [3] => xyz
)

Note that you actually get the named group as well as the numerical key
value too, so if you do use them, and you're counting array elements, be
aware that your array might be bigger than you initially expect it to be.

[#54] wjaspers4 [at] gmail [dot] com [2009-02-27 15:16:08]

I recently encountered a problem trying to capture multiple instances of named subpatterns from filenames.
Therefore, I came up with this function.

The function allows you to pass through flags (in this version it applies to all expressions tested), and generates an array of search results.

Enjoy!

<?php function preg_match_multiple( array $patterns=array(), $subject=null, &$findings=array(), $flags=false, &$errors=array() ) { foreach( $patterns as $name => $pattern ) { if( 1 <= preg_match_all( $pattern, $subject, $found, $flags ) ) { $findings[$name] = $found; } else { if( PREG_NO_ERROR !== ( $code = preg_last_error() )) { $errors[$name] = $code; } else $findings[$name] = array(); } } return (0===sizeof($errors)); } ?>

[#55] skds1433 at hotmail dot com [2009-02-19 06:41:38]

here is a small tool for someone learning to use regular expressions. it's very basic, and allows you to try different patterns and combinations. I made it to help me, because I like to try different things, to get a good understanding of how things work.

<?php $search = isset($_POST['search'])?$_POST['search']:"//"; $match = isset($_POST['match'])?$_POST['match']:"<>"; echo '<form method="post">'; echo 's: <input style="width:400px;" name="search" type="text" value="'.$search.'" /><br />'; echo 'm:<input style="width:400px;" name="match" type="text" value="'.$match.'" /><input type="submit" value="go" /></form><br />'; if (preg_match($search, $match)){echo "matches";}else{echo "no match";} ?>

[#56] akniep at rayo dot info [2009-01-30 03:05:41]

Bugs of preg_match (PHP-version 5.2.5)

In most cases, the following example will show one of two PHP-bugs discovered with preg_match depending on your PHP-version and configuration.

<?php $text = "test="; // creates a rather long text for ($i = 0; $i++ < 100000;) $text .= "%AB"; // a typical URL_query validity-checker (the pattern's function does not matter for this example) $pattern = '/^(?:[;\/?:@&=+$,]|(?:[^\W_]|[-_.!~*\()\[\] ])|(?:%[\da-fA-F]{2}))*$/'; var_dump( preg_match( $pattern, $text ) ); ?>

Possible bug (1):
=============
On one of our Linux-Servers the above example crashes PHP-execution with a C(?) Segmentation Fault(!). This seems to be a known bug (see http://bugs.php.net/bug.php?id=40909), but I don't know if it has been fixed, yet.
If you are looking for a work-around, the following code-snippet is what I found helpful. It wraps the possibly crashing preg_match call by decreasing the PCRE recursion limit in order to result in a Reg-Exp error instead of a PHP-crash.

<?php [...] // decrease the PCRE recursion limit for the (possibly dangerous) preg_match call $former_recursion_limit = ini_set( "pcre.recursion_limit", 10000 ); // the wrapped preg_match call $result = preg_match( $pattern, $text ); // reset the PCRE recursion limit to its original value ini_set( "pcre.recursion_limit", $former_recursion_limit ); // if the reg-exp fails due to the decreased recursion limit we may not make any statement, but PHP-execution continues if ( PREG_RECURSION_LIMIT_ERROR === preg_last_error() ) { // react on the failed regular expression here $result = [...]; // do logging or email-sending here [...] } //if ?>

Possible bug (2):
=============
On one of our Windows-Servers the above example does not crash PHP, but (directly) hits the recursion-limit. Here, the problem is that preg_match does not return boolean(false) as expected by the description / manual of above.
In short, preg_match seems to return an int(0) instead of the expected boolean(false) if the regular expression could not be executed due to the PCRE recursion-limit. So, if preg_match results in int(0) you seem to have to check preg_last_error() if maybe an error occurred.

[#57] Alex Zinchenko [2008-12-10 18:15:26]

If you need to check whether string is a serialized representation of variable(sic!) you can use this :

<?php $string = "a:0:{}"; if(preg_match("/(a|O|s|b)\x3a[0-9]*? ((\x3a((\x7b?(.+)\x7d)|(\x22(.+)\x22\x3b)))|(\x3b))/", $string)) { echo "Serialized."; } else { echo "Not serialized."; } ?>

But don't forget, string in serialized representation could be VERY big,
so match work can be slow, even with fast preg_* functions.

[#58] phil dot taylor at gmail dot com [2008-10-22 17:01:20]

If you need to check for .com.br and .com.au and .uk and all the other crazy domain endings i found the following expression works well if you want to validate an email address. Its quite generous in what it will allow

<?php $email_address = "phil.taylor@a_domain.tv"; if (preg_match("/^[^@]*@[^@]*\.[^@]*$/", $email_address)) { return "E-mail address"; } ?>

[#59] Steve Todorov [2008-10-02 18:23:45]

While I was reading the preg_match documentation I didn't found how to match an IP..
Let's say you need to make a script that is working with ip/host and you want to show the hostname - not the IP.

Well this is the way to go:

<?php $ip = $_POST['ipOrHost']; if(preg_match('/(\d+).(\d+).(\d+).(\d+)/',$ip)) $host = gethostbyaddr($ip); else $host = gethostbyname($ip); echo $host; ?>

This is a really simple script made for beginners !
If you'd like you could add restriction to the numbers.
The code above will accept all kind of numbers and we know that IP address could be MAX 255.255.255.255 and the example accepts to 999.999.999.999.

Wish you luck!

Best wishes,
Steve

[#60] Ashus [2008-09-12 08:18:33]

If you need to match specific wildcards in IP address, you can use this regexp:

<?php $ip = '10.1.66.22'; $cmp = '10.1.??.*'; $cnt = preg_match('/^' .str_replace( array('\*','\?'), array('(.*?)','[0-9]'), preg_quote($cmp)).'$/', $ip); echo $cnt; ?>

where '?' is exactly one digit and '*' is any number of any characters. $cmp mask can be provided wild by user, $cnt equals (int) 1 on match or 0.

[#61] wjaspers4[at]gmail[dot]com [2008-08-28 07:55:28]

I found this rather useful for testing mutliple strings when developing a regex pattern.
<?php function preg_match_batch( $expr, $batch=array() ) { // create a placeholder for our results $returnMe = array(); // for every string in our batch ... foreach( $batch as $str ) { // test it, and dump our findings into $found preg_match($expr, $str, $found); // append our findings to the placeholder $returnMe[$str] = $found; } return $returnMe; } ?>

[#62] Dino Korah AT webroot DOT com [2008-07-08 16:11:36]

preg_match and preg_replace_callback doesnt match up in the structure of the array that they fill-up for a match.
preg_match, as the example shows, supports named patterns, whereas preg_replace_callback doesnt seem to support it at all. It seem to ignore any named pattern matched.

[#63] jonathan dot lydall at gmail dot removethispart dot com [2008-05-26 12:50:00]

Because making a truly correct email validation function is harder than one may think, consider using this one which comes with PHP through the filter_var function (http://www.php.net/manual/en/function.filter-var.php):

<?php $email = "someone@domain .local"; if(!filter_var($email, FILTER_VALIDATE_EMAIL)) { echo "E-mail is not valid"; } else { echo "E-mail is valid"; } ?>