search

Home  >  Q&A  >  body text

php - Two regular expressions have different effects

Matches the string "abc 123". Ask to take out abc and 123.

<?php
    $str = "abc 123";
    $preg = "/^(.*?)\s+(.*?)$/";
    $preg1 = "/^(.*?)\s*(.*?)$/";
    preg_match($preg, $str, $tmp);
    preg_match($preg1, $str, $tmp1);
    
    echo '<pre>';
    print_r($tmp);
    print_r($tmp1);
    echo '</pre>';
    
    // $tmp
    Array
    (
        [0] => abc 123
        [1] => abc
        [2] => 123
    )
    
    // $tmp1
    Array
    (
        [0] => abc 123
        [1] =>
        [2] => abc 123
    )

Why are the matching results different? Is there anything I need to pay attention to?

ringa_leeringa_lee2810 days ago542

reply all(3)I'll reply

  • 伊谢尔伦

    伊谢尔伦2017-05-16 13:10:16

    Neither of the first two students answered to the point.
    Let me answer it,

    The real secret lies in 惰性(或叫非贪婪)匹配’s rules:

    An asterisk or plus sign followed by a question mark indicates lazy matching, which means matching as little as possible.

    1. /(.*?)s+/, the plus sign indicates that the previous match (that is, the space s) appears one or more times. This paragraph means matching as little as possible, followed by at least one space s. Looking at it this way, the previous bracket can match abc.

    2. /(.*?)s*/, the asterisk indicates that the previous match (that is, the space s) appears 0 or more times. The meaning of this paragraph is to match as little as possible, and there can be nothing after it (s*). This results in an empty string, matching nothing.

    reply
    0
  • 阿神

    阿神2017-05-16 13:10:16

    Please note that the results of the regular wildcards s+ and s* are definitely different.
    "*" Matches the preceding subexpression zero or more times. For example, zo"*" 匹配前面的子表达式零次或多次。例如,zo能匹配“z"以及"zoo"。等价于{0,}。
    "+" matches "z" and "zoo".

    is equivalent to {0,}.

    "+" Matches the previous subexpression one or more times. For example, "zo+" matches "zo" and "zoo", but not "z". + is equivalent to {1,}. /^(.

    ?)s+(.

    ?)$/

    The "/" in front and the "/" in the back indicate that the beginning and end have no practical meaning.
    The first "^" means matching the beginning of the text
    () is the priority from left to right. "." means matching any character, "*" means matching more than 0 times

    The first bracket means matching all characters, " ?" non-greedy matching, which means matching the previous character or subexpression zero or once.

    s is any whitespace character

    🎜Recommend a tutorial for you to learn: 30-minute introduction to regular expressions🎜

    reply
    0
  • PHPz

    PHPz2017-05-16 13:10:16

    Correct the answer, it is caused by lazy matching, which is also one of the difficulties of regular expression.

    The so-called lazy matching can be understood in a popular way: if you don’t tell it that there is something in front of it, it will not go. In the first regular rule, you tell it to go forward and stop when it encounters at least one space to count it as a match.
    The second regular rule, you tell it to move forward. It doesn’t need to encounter spaces or it can count as a match if it encounters spaces. It is very lazy and simply stops moving. Nothing matches.

    reply
    0
  • Cancelreply