Home  >  Article  >  Backend Development  >  Regular relations in PHP

Regular relations in PHP

不言
不言Original
2018-04-23 13:55:081525browse

The content of this article is about the regular relationship in PHP, which has certain reference value. Now I share it with you. Friends in need can refer to it

Overview

Regular expression is a grammatical rule that describes the result of a string. It is a specific formatting pattern that can match, replace, and intercept matching strings. Commonly used languages ​​basically have regular expressions, such as JavaScript, java, etc. In fact, as long as you understand the regular use of one language, it is relatively simple to use the regular rules of other languages. The text mainly focuses on solving the following problems.

  • What are the commonly used escape characters

  • What are qualifiers and locators

  • What is a word locator

  • What are the special characters

  • What is a back reference and how to use a back reference

  • Matching pattern

  • How to use regular expressions in php

  • What aspects of php need to use regular expressions

  • How to perform email matching, url matching, and mobile phone matching

  • How to use regular expressions to replace certain characters in a string

  • The difference between greedy matching and lazy matching

  • Backtracking and solid grouping of regular expressions

  • What are the advantages and disadvantages of regular expressions

Summary of basic knowledge of regular expressions

Line locators (^ and $)

Line locators are used to describe the boundaries of strings. "$" represents the end of the line "^" represents the beginning of the line, such as "^de", which represents a string starting with de "de$" , represents a string ending with de.

Word delimiter

When we are searching for a word, such as whether an exists in a string "gril and body", it is obvious that if it matches, an is definitely OK The matching string "gril and body" is matched. How can we make it match words instead of part of words? At this time, we can use a word delimiter \b.
\ban\b If you try to match "gril and body", it will prompt that it cannot match.
Of course there is also a capital \B, which means exactly the opposite of \b. The string it matches cannot be a complete word, but a part of other words or strings. Such as \Ban\B.

Select the character (|) to represent or

Select the character to represent or. For example, Aa|aA means Aa or aA. Note that the difference between using "[]" and "|" is that "[]" can only match a single character, while "|" can match a string of any length. When using "[]", it is often used together with the connecting character "-", such as [a-d], which represents a or b or c or d.

Exclude characters, exclude operations

Regular expressions provide "^" to exclude non-matching characters, ^ is generally placed in []. For example, [^1-5], this character is not a number between 1 and 5.

Qualifier (?* {n, m})

Qualifier is mainly used to limit the number of occurrences of each string.

Qualified characters meaning
? Zero or one time
* Zero or more times
One or more times
{n} n times
{n,} At least n times
{n,m} n to m times

such as (D) Represents one or more D

dot operator

matches any character (excluding newline character)

Backslash (\)

Backslashes in expressions have multiple meanings, such as escaping, specifying a predefined character set, defining assertions, and displaying non-printable characters.

Escape characters

Escape characters mainly convert some special characters into ordinary characters. These commonly used special characters include ".", "?", "\", etc.

Specify a predefined character set

Character Meaning
\d Any decimal number [0-9]
\D Any non-decimal number
\s Any white space character (space, line feed, form feed, carriage return, tab character)
\S Any non-whitespace character
\w Any one word character
\W Any non-word characters

Display non-printable charactersCharactersMeaning##\fPage feed\nLine feed\rEnter\tCharacter

Bracket characters ()

The main functions of parentheses in regular expressions are:

  • Change the scope of qualifiers such as (|, *, ^)
    For example (my|your)baby, if there is no "()", | will match either my or yourbaby. With parentheses, it will match either mybaby or yourbaby.

  • Group to facilitate reverse reference

Backreference

Backreference relies on subexpressions The "memory" function matches consecutive strings or characters. For example, (dqs)(pps)\1\2 means matching the string dqsppsdqspps. In the following PHP application, I will learn about back references in detail.

Pattern modifier

The role of the pattern modifier is to set the pattern, that is, how the regular expression is interpreted. The main patterns in php are as follows:

##\a Alarm
\b Backspace
Modifier Description
i Ignore case
m Multiple text mode
s Single line text mode
#x Ignore whitespace characters

Application of regular expressions in php

String matching in php

The so-called string matching, the implication is to determine whether a string contains or is equal to another string . If we do not use regular expressions, we can use many methods provided in PHP to make such judgments.

Do not use regular matching

  • strstr function
    string strstr (string h aystack,mixedneedle [, bool $before_needle = false ])

    • Note 1:

      haystackiswhenthingWordsCharacterString, needle is the string to be searched for. This function is case-sensitive.

      ##Note 2: The return value is from the beginning to the end of needle.
    • Note 3: Regarding $needle, if it is not a string, it is treated as an integer and used as the sequence number of the character.
    • Note 4: If before_needle is true, return the previous thing.
    • The stristr function is the same as the strstr function, except that it is not case-sensitive
  • strpo function
  • int strpos ( string

  • haystack,mixedneedle [, int $offset = 0 ] ) Note 1: The optional offset parameter can be used to specify which character in the haystack to start searching from. The numerical position returned is relative to the starting position of the haystack.
    #stripos - Find the position where

    first appears in the string
  • (
  • is not size sensitive

    ) strrpos - Calculate the position of the last occurrence of

    in the target string
  • #strripos - Calculate the position of the specified string in the target string
  • The last occurrence of

    ( is not case-sensitive)

  • Use regular expressions for matchingIn php, preg_math( ) and preg_match_all function for regular matching. The prototypes of these two functions are as follows:

    int preg_match|preg_match_all ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] )

    Search for a match between subject and the regular expression given by pattern.
    pattern: Pattern to be searched, string type.
    subject: Input string.
    matches: If the parameter matches is provided, it will be populated as the search results. matches[0]Pack including complete ##ModuleFormmatchmatchtothis, matches[1] will contain the first capture subgroup matched to text, and so on. flags: flags can be set to the following flag values: PREG_OFFSET_CAPTURE If this flag is passed, the string offset (relative to the target string) will be appended to the return for each occurrence of a match. Note: This will change the array filled in the matches parameter so that each element becomes a string where the 0th element is the matched string and the 1st element is the offset of the matched string in the target string subject. . offset: Usually, the search starts from the beginning of the target string. The optional parameter offset is used to specify the search starting from an unknown point in the target string (unit is bytes). Return value: preg_match() returns the number of matches for pattern. Its value will be 0 (no match) or 1 because preg_match() will stop searching after the first match. preg_match_all() is different from this, it will search the subject until it reaches the end. If an error occurs preg_match() returns FALSE.


    Example
    Example 1
    Judge string” http://blog.csdn .net/hsd2012" contains csdn?

    Solution 1 (regular rules are not applicable):

    If regular rules are not applicable, we can use either strstr or strpos. Here, I will use the strstr function, code As follows:

    • $str='http://blog.csdn.net/hsd2012';function checkStr1($str,$str2)
      {    return strstr1($str,$str2)?true:false;
      }echo checkStr($str,'csdn');

      Solution 2: Use regular rules
      Because we only need to determine whether it exists, we choose preg_match.

      $str='http://blog.csdn.net/hsd2012';$pattern='/csdn/';function checkStr2($str,$str2)
      {    return preg_match($str2,$str)?true:false;
      }echo checkStr2($str,$pattern);

    • Example 2 (examination of
    word delimiter

    )
    Determine whether the string "I am a good boy" contains the word go

    First The judgment is a word, not a string, so when comparing, you need to compare whether it contains 'go', that is, there is a space before and after the string go.
      Analysis: If you use non-regular comparison, you only need to call the checkStr1() function above. Note that a space must be added before and after the second parameter, that is, 'go'. If you use regular expressions,
    • we can consider using the word delimiter \b, then $pattern='/\bgo\b/'; and then call the checkStr2 function.


      Example 3 (Inspection
      Backreference
      )
      Determine whether the string "I am a good boy" contains 3 identical letters

      Analysis: At this time, if we do not use regular expressions, It will be difficult to judge, because there are too many letters, and it is impossible for us to compare all letters with the string, which would also require a lot of work. This time involves current back references. In PHP regular expressions, \n is used to represent the nth matched result. For example, \5 represents the fifth matching result. Then
    • $pattern='/(\w).*\1.*\1/';
    • The main thing to note in this question is that you need to use () when using reverse matching, When matching in reverse, match the characters or strings appearing in ().

      String replacement in phpDoes not use regular expressions

      When replacing strings in php, if regular expressions do not apply, we usually use substr , mb_substr, str_replace, and substr_replace. The differences between these functions are as follows.
    • ##str_replace(find, replace,string,count)Use a string to replace other characters in the string. find Required. Specifies the value to look for. replace required. Specifies the value to replace the value in find. string required. Specifies the string to be searched for. count optional. A variable counting the number of substitutions. substr_replace(string,replacement,start,length)Replace part of a string with another string. Suitable for replacing strings at custom locations. string Required. Specifies the string to check. replacement required. Specifies the string to be inserted. start is required. Specifies where in the string to begin replacement.

      Use regular expressions

      If you use regular expression replacement, PHP provides preg_replace _callback and preg_replace functions. The prototype of preg_replace is as follows:
      mixed preg_replace ( mixed pattern,m ixedreplacement , mixed ##subject[,intlimit = -1 [, int &count]]) Functionnumber DescriptionIn the character symbol string subject, search for pattern, Use replacement to replace , if there is limitthengenerationTablelimit Make Replace Replace limitSecond-rate.pregreplacecallbackpregreplace识,不同的是pregreplaceback使用一个回调函数callback来代替replacement.−例1将字符串”hello,中国”中的hello替换为′你好′;如果不是用正则:str=’hello,中国’; str=strreplace(′hello′,′你好′,str) 或是使用str=substrreplace(str,’你好’,0,5) 使用正则 pattern=′/hello/′;str=preg_replace (pattern,′你好′,str); - 例2 去除字符串”gawwenngeeojjgegop”中连续相同的字母

      $str='gawwenngeeojjgegop';$pattern='/(.)\1/';$str=preg_replace($pattern,'',$str);

      解析:当然这样可能会遇到,当第一次去除了重复了字符串后,又出来重复的字符串。如字符串味’gewwenngeeojjgegop’,针对这中问题,当然,这样的话,通过判断,继续替换下去。

      • 例3 
        将字符串中”age13gegep3iorji65k65k”;中出现的连续两个数字改为第二个数字,如字符串中13被改为3

      $str='age13gegep3iorji65k65k';$pattern='/(\d)(\d)/';$str=preg_replace($pattern,'$2', $str);

      解析:$n在正则表达式外使用反向引用。n代表第几次匹配到的结果。

      php中字符串分割

      不适用正则

      php提供了explode函数去分割字符串,与其对应的是implode。关于explode原型如下: 
      array explode ( string delimiter,stringstring [, int $limit ] ) 
      delimiter:边界上的分隔字符。 
      string:输入的字符串。 
      limit:如果设置了 limit 参数并且是正数,则返回的数组包含最多 limit 个元素,而最后那个元素将包含 string 的剩余部分。如果 limit 参数是负数,则返回除了最后的 -limit 个元素外的所有元素。如果 limit 是 0,则会被当做 1。

      使用正则

      关于通过正则表达式进行字符串分割,php提供了split、preg_split 函数。preg_split() 函数,通常是比 split() 更快的替代方案。 
      array preg_split ( string pattern,stringsubject [, int limit=1[,intflags = 0 ]] )

      • 例题 
        将字符串 ‘http://blog.csdn.net/hsd2012/article/details/51152810‘按照’/’进行分割 
        解法一:

      $str='http://blog.csdn.net/hsd2012/article/details/51152810';$str=explode('/', $str);

      解法二:

      $str='http://blog.csdn.net/hsd2012/article/details/51152810';$pattern='/\//';  /*因为/为特殊字符,需要转移*/$str=preg_split ($pattern, $str);

      php中贪婪匹配与惰性匹配

      • 贪婪匹配:就是匹配尽可能多的字符。 
        比如,正则表达式中m.*n,它将匹配最长以m开始,n结尾的字符串。如果用它来搜索manmpndegenc的话,它将匹配到的字符串是manmpndegen而非man。可以这样想,当匹配到m的时候,它将从后面往前匹配字符n。

      • 懒惰匹配:就是匹配尽可能少的字符。 
        有的时候,我们需要并不是去贪婪匹配,而是尽可能少的去匹配。这时候,就需要将其转为惰性匹配。怎样将一个贪婪匹配转为惰性匹配呢?只需要在其后面添加一个”?”即可。如m.*?n将匹配manmpndegenc,匹配到的字符串是man。

      Function symbol Function Description
      函数符 描述
      *? 零次或多次,但尽可能少的匹配
      +? 一次或多次,但尽可能少的匹配
      ?? 0次或1次,但尽可能少的匹配
      {n,}? 至少n次,但尽可能少的匹配
      {n,m}? n到m次 ,但尽可能少的匹配

      php正则表达式之回溯与固态分组

      回溯

      首先我们需要清楚什么是回溯,回溯就像是在走岔路口,当遇到岔路的时候就先在每个路口做一个标记。如果走了死路,就可以照原路返回,直到遇见之前所做过的标记,标记着还未尝试过的道路。如果那条路也走不能,可以继续返回,找到下一个标记,如此重复,直到找到出路,或者直到完成所有没有尝试过的路。首先我们看例题

      $str='aageacwgewcaw';$pattern='/a\w*c/i';$str=preg_match($pattern, $str);

      看到上面的程序,可能都清楚是什么意思,就是匹配$str是否包含这样一个由”a+0个或多个字母+c”不区分大小写的字符串。但是至于程序怎样去匹配的呢?匹配的过程中,回溯了多少次呢?

      匹配过程 接下来操作描述
      ‘a\w*c’中a匹配到’aageacwgewcaw’中第一个字符a \w进行下一个字符匹配
      因为\w是贪婪匹配,会一直匹配到’aageacwgewcaw’中最后一个字符w c进行下一个字符匹配时
      ‘a\w*c’中c发现没有可以匹配的 于是\w匹配进行第一次回溯,匹配到倒数第二个字符a
      ‘a\w*c’中c发现还是没有可以匹配的 于是\w匹配进行第二次回溯,匹配到倒数第三个字符c
      ‘a\w*c’中c匹配成功 匹配结束返回结果

      现在,如果我们将pattern为pattern=’/a\w*?c/i’;又会回溯多少次呢?正确答案是回溯四次。

      固态分组

      固态分组,目的就是减少回溯次数, 使用(?>…)括号中的匹配时如果产生了备选状态,那么一旦离开括号便会被立即 引擎抛弃掉。举个典型的例子如: ‘\w+:’这个表达式在进行匹配时的流程是这样的,会优先去匹配所有的符合\w的字符,假如字符串的末尾没有’:’,即匹配没有找到冒号,此时触发回溯机制,他会迫使前面的\w+释放字符,并且在交还的字符中重新尝试与’:’作比对。但是问题出现在这里: \w是不包含冒号的,显然无论如何都不会匹配成功,可是依照回溯机制,引擎还是得硬着头皮往前找,这就是对资源的浪费。所以我们就需要避免这种回溯,对此的方法就是将前面匹配到的内容固化,不令其存储备用状态!,那么引擎就会因为没有备用状态可用而只得结束匹配过程。大大减少回溯的次数。 
      如下代码,就不会进行回溯:

      $str='nihaoaheloo';$pattern='/(?>\w+):/';$rs=preg_match($pattern, $str);

      当然有的时候,又需慎用固态分组,如下,我要检查$str中是否包含以a结尾的字符串,很明显是包含的,但是因为使用了固态分组,反而达不到我们想要的效果

      $str='nihaoahelaa';$pattern1='/(?>\w+)a/';$pattern2='/\w+a/';$rs=preg_match($pattern1, $str);//0$rs=preg_match($pattern2, $str);//1

      php中其他常用字符串操作函数

      • 字符串截取截取 
        string substr ( string string,intstart [, int length])stringmbsubstr(stringstr , int start[,intlength = NULL [, string $encoding = mb_internal_encoding() ]] )

      • 字符串中大小写转换 
        strtoupper 
        strtolower 
        ucfirst 
        ucwords

      • 字符串比较 
        -strcmp、strcasecmp、strnatcmp

      • 字符串过滤

      • 字符串翻转 
        strrev($str);

      • 字符串随机排序 
        string str_shuffle ( string $str )

      Supplementary

      How to perform email matching, url matching, and mobile phone matching

      Use the preg_match function for matching. The following content is copied from TP.
      Email Verification
      pattern= /\w ([ .]\w )@\w ([.]\w )\.\w ([. ]\w )∗/';
      urlmatch
      pattern='/^http(s?) :\/\/(?:[A-za-z0-9-] \.) [A-za-z]{2,4}(:\d )?(?:[\/\?#][ \/=\?%\-&~`@[\]\': !\.#\w]*)?/';
      Mobile phone verification
      ##pattern= /1[3458]\d10/';
      The advantages of regularity in php Disadvantages
      Regular expressions in php can sometimes help us solve many difficult matching or replacement problems in php functions. Then we often need to consider the efficiency of regular expressions in PHP, so at certain times, if we can avoid using regular expressions, we should try not to use them, unless it must be used in certain situations, or we can effectively reduce the number of backtracking.

      Related recommendations:

      PHP regular expression (added 177 mobile phone numbers)

      PHP regular expression sharing

The above is the detailed content of Regular relations in PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn