Home  >  Article  >  php教程  >  Commonly used functions for PHP regular expressions

Commonly used functions for PHP regular expressions

黄舟
黄舟Original
2016-12-12 15:51:161505browse

There are two sets of regular expression function libraries in PHP. One set is provided by PCRE (Perl Compatible Regular Expression) library provided. The PCRE library implements regular expression pattern matching using the same syntax rules as Perl, using functions named with the "preg_" prefix. Another set is provided by POSIX (Portable Operation System interface) extension library provided. POSIX extended regular expressions by POSIX 1003.2 definition, generally use functions named with the prefix "ereg_".
The functions of the two function libraries are similar, but the execution efficiency is slightly different. Generally speaking, to achieve the same function, the efficiency of using the PCRE library is slightly superior. Its use is described in detail below.
6.3.1 Regular expression matching
1. preg_match()
Function prototype: int preg_match (string $pattern, string $content [, array $matches])
preg_match The () function searches the $content string for content that matches the regular expression given by $pattern. If $matches is provided, the matching results are placed into its middle. $matches[0] will contain the text that matches the entire pattern, $matches[1] will contain the first captured match of the pattern element enclosed in parentheses, and so on. This function only Make a match and eventually return the number of matching results of 0 or 1. Listing 6.1 shows a code example for the preg_match() function.
Code 6.1 Date and time matching

<?php 
//需要匹配的字符串。date函数返回当前时间 
$content = "Current date and time is ".date("Y-m-d h:i a").", we are learning PHP together."; 
//使用通常的方法匹配时间 
if (preg_match ("/\d{4}-\d{2}-\d{2} \d{2}:\d{2} [ap]m/", $content, $m)) 
{ 
echo "匹配的时间是:" .$m[0]. "\n"; 
} 
//由于时间的模式明显,也可以简单的匹配 
if (preg_match ("/([\d-]{10}) ([\d:]{5} [ap]m)/", $content, $m)) 
{ 
echo "当前日期是:" .$m[1]. "\n"; 
echo "当前时间是:" .$m[2]. "\n"; 
} 
?>

This is a simple dynamic text string matching example. Assuming that the current system time is "13:25 on August 17, 2006", the following content will be output.
The matching time is: 2006-08-17 01:25 pm
The current date is: 2006-08-17
The current time is: 01:25 pm
2. ereg() and eregi()
ereg() is the matching function for regular expressions in the POSIX extension library. eregi() is a case-ignoring version of the ereg() function Book. Both have similar functions to preg_match, but the function returns a Boolean value indicating whether the match was successful or not. It should be noted that the first parameter of the POSIX extension library function accepts regular Expression string, i.e. no delimiters are required. For example, Listing 6.2 is a method for checking the security of file names.
Code 6.2 Security check of file name

<?php 
$username = $_SERVER[&#39;REMOTE_USER&#39;]; 
$filename = $_GET[&#39;file&#39;]; 
//对文件名进行过滤,以保证系统安全 
if (!ereg(&#39;^[^./][^/]*$&#39;, $userfile)) 
{ 
die(&#39;这不是一个非法的文件名!&#39;); 
} 
//对用户名进行过滤 
if (!ereg(&#39;^[^./][^/]*$&#39;, $username)) 
{ 
die(&#39;这不是一个无效的用户名&#39;); 
} 
//通过安全过滤,拼合文件路径 
$thefile = "/home/$username/$filename"; 
?>

Normally, using the Perl-compatible regular expression matching function perg_match() will be faster than using ereg() or eregi(). If you just want to find whether a string contains a certain substring, it is recommended to use the strstr() or strpos() function.
3. preg_grep()
Function prototype: array preg_grep (string $pattern, array $input) The
preg_grep() function returns an array containing the cells in the $input array that match the given $pattern pattern. Preg_grep() also only performs a match for each element in the input array $input. Listing 6.3 gives an example that simply illustrates the use of the preg_grep() function.
Code 6.3 Array query matching

<?php 
$subjects = array( 
"Mechanical Engineering", "Medicine", 
"Social Science", "Agriculture", 
"Commercial Science", "Politics" 
); 
//匹配所有仅由有一个单词组成的科目名 
$alonewords = preg_grep("/^[a-z]*$/i", $subjects); 
?>

6.3.2 Perform global regular expression matching
1. preg_match_all()
Similar to preg_match() function. If the third parameter is used, all possible matches will be put in. This function returns the entire module The number of times the formula is matched (may be 0), and False is returned if an error occurs. Below is an example of converting a URL link address in text into HTML code. Code 6.4 is Usage example of preg_match_all() function.
Code 6.4 Convert the link address in the text to HTML

<?php 
//功能:将文本中的链接地址转成HTML 
//输入:字符串 
//输出:字符串 
function url2html($text) 
{ 
//匹配一个URL,直到出现空白为止 
preg_match_all("/http:\/\/?[^\s]+/i", $text, $links); 
//设置页面显示URL地址的长度 
$max_size = 40; 
foreach($links[0] as $link_url) 
{ 
//计算URL的长度。如果超过$max_size的设置,则缩短。 
$len = strlen($link_url); 
if($len > $max_size) 
{ 
$link_text = substr($link_url, 0, $max_size)."..."; 
} else { 
$link_text = $link_url; 
} 
//生成HTML文字 
$text = str_replace($link_url,"<a href=&#39;$link_url&#39;>$link_text</a>",$text); 
} 
return $text; 
} 
//运行实例 
$str = “这是一个包含多个URL链接地址的多行文字。欢迎访问http://www.jb51.net”; 
print url2html($str); 
/*输出结果 
这是一个包含多个URL链接地址的多行文字。欢迎访问<a href=&#39;http://www.jb51.net&#39;> 
http://www.jb51.net</a> 
*/ 
?>

2. Match multiple lines
It is difficult to perform complex matching operations just using regular table functions under POSIX. For example, perform matching searches on entire files (especially multi-line text). One way to do this using ereg() is to do it in separate lines. The example in Listing 6.5 demonstrates how ereg() assigns the parameters of the INI file to an array.
Code 6.5 Multi-line matching of file content

<?php 
$rows = file(&#39;php.ini&#39;); //将php.ini文件读到数组中 
//循环遍历 
foreach($rows as $line) 
{ 
If(trim($line)) 
{ 
//将匹配成功的参数写入数组中 
if(eregi("^([a-z0-9_.]*) *=(.*)", $line, $matches)) 
{ 
$options[$matches[1]] = trim($matches[2]); 
} 
unset($matches); 
} 
} 
//输出参数结果 
print_r($options); 
?>

Tips
This is just for convenience. To parse an *.ini file, the best way is to use the function parse_ini_file(). This function directly parses the *.ini file into a large array.
6.3.3 Regular expression replacement
1. ereg_replace() and eregi_replace()
Function prototype: string ereg_replace (string $pattern, string $replacement, string $string)
string eregi_replace (string $pattern, string $replacement, string $string)
ereg_replace() searches for the pattern string $pattern in $string and replaces the matching result for $replacement. When $pattern contains pattern units (or sub-patterns), positions in the form of "1" or "$1" in $replacement will be replaced by these sub-patterns in turn. Replace the content matched by the pattern. and"
eregi_replace() and ereg_replace() have the same functions, except that the former ignores case. Code 6.6 is an application example of this function. This code demonstrates how to do simple cleaning work on the program source code.
Code 6.6 Source code cleaning

<?php 
$lines = file(&#39;source.php&#39;); //将文件读入数组中 
for($i=0; $i<count($lines); $i++) 
{ 
//将行末以“\\”或“#”开头的注释去掉 
$lines[$i] = eregi_replace("(\/\/|#).*$", "", $lines[$i]); 
//将行末的空白消除 
$lines[$i] = eregi_replace("[ \n\r\t\v\f]*$", "\r\n", $lines[$i]); 
} 
//整理后输出到页面 
echo htmlspecialchars(join("",$lines)); 
?>

2.preg_replace()
函数原型:mixed preg_replace (mixed $pattern, mixed $replacement, mixed $subject [, int $limit])
preg_replace较ereg_replace的功能更加强大。其前三个参数均可以使用数组;第四个参数$limit可以设置替换的次数,默认为全部替换。代码6.7是一个数组替换的应用实例。
代码6.7 数组替换 

<?php 
//字符串 
$string = "Name: {Name}<br>\nEmail: {Email}<br>\nAddress: {Address}<br>\n"; 
//模式 
$patterns =array( 
"/{Address}/", 
"/{Name}/", 
"/{Email}/" 
); 
//替换字串 
$replacements = array ( 
"No.5, Wilson St., New York, U.S.A", 
"Thomas Ching", 
"tom@emailaddress.com", 
); 
//输出模式替换结果 
print preg_replace($patterns, $replacements, $string); 
?>

输出结果如下。
Name: Thomas Ching",
Email: tom@emailaddress.com
Address: No.5, Wilson St., New York, U.S.A
在preg_replace的正则表达式中可以使用模式修正符“e”。其作用是将匹配结果用作表达式,并且可以进行重新运算。例如: 

<?php 
$html_body = “<HTML><Body><H1>TEST</H1>My Picture<Img src=”my.gif”></Body></HTML>”; 
//输出结果中HTML标签将全部为小写字母 
echo preg_replace ( 
"/(<\/?)(\w+)([^>]*>)/e", 
"&#39;\\1&#39;.strtolower(&#39;\\2&#39;).&#39;\\3&#39;", //此处的模式变量\\2将被strtolower转换为小写字符 
$html_body); 
?>

提示
preg_replace函数使用了Perl兼容正则表达式语法,通常是比ereg_replace更快的替代方案。如果仅对字符串做简单的替换,可以使用str_replace函数。
6.3.4 正则表达式的拆分
1.split()和spliti()
函数原型:array split (string $pattern, string $string [, int $limit])
本函数返回一个字符串数组,每个单元为$string经正则表达式$pattern作为边界分割出的子串。如 果设定了$limit,则返回的数组最多包含$limit个单元。而其中最后一个单元包含了$string中剩余的所有部分。spliti是split的 忽略大小版本。代码6.8是一个经常用到关于日期的示例。
代码6.8 日期的拆分 

<?php 
$date = "08/30/2006"; 
//分隔符可以是斜线,点,或横线 
list($month, $day, $year) = split (&#39;[/.-]&#39;, $date); 
//输出为另一种时间格式 
echo "Month: $month; Day: $day; Year: $year<br />\n"; 
?>

2.preg_split()
本函数与split函数功能一致。代码6.9是一个查找文章中单词数量的示例。
代码6.9 查找文章中单词数量  

<?php 
$seek = array(); 
$text = "I have a dream that one day I can make it. So just do it, nothing is impossible!"; 
//将字符串按空白,标点符号拆分(每个标点后也可能跟有空格) 
$words = preg_split("/[.,;!\s&#39;]\s*/", $text); 
foreach($words as $val) 
{ 
$seek[strtolower($val)] ++; 
} 
echo "共有大约" .count($words). "个单词。"; 
echo "其中共有" .$seek[&#39;i&#39;]. "个单词“I”。"; 
?>

提示
preg_split() 函数使用了Perl兼容正则表达式语法,通常是比split()更快的替代方案。使用正则表达式的方法分割字符串,可以使用更广泛的分隔字符。例如,上面 对日期格式和单词处理的分析。如果仅用某个特定的字符进行分割,建议使用explode()函数,它不调用正则表达式引擎,因此速度是最快的。 

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn