search

Home  >  Q&A  >  body text

node.js - nodejs 正则换行的问题

这是我的正则。

\<body\>([\s\S].*?)\<\/body\>

str是我要查找的字符串。假如我去掉字符串里面的换行,正则可以匹配到东西,但是如果不加这个代码,正则就匹配不到。

str = str.replace(/\n/g, "");

谁能解释一下?如何解决这个问题?

----------补充-----------

后来换成

\<body\>([\s\S]*?)\<\/body\>

这样就行了。
.*?和*?的区别在哪呢?

伊谢尔伦伊谢尔伦2863 days ago758

reply all(1)I'll reply

  • PHP中文网

    PHP中文网2017-04-17 15:36:52

    Understand that you want to get all the content in the body tag

    Regular expression below

    /\<body\>([\s\S].*?)\<\/body\>/

    The reason why it cannot match correctly is because it was written incorrectly.

    Break down the key parts of this expression

    ([\s\S].*?)

    [sS] matches a whitespace or non-whitespace character. In other words, it can match all characters, including newlines, spaces and tabs, but can only match one

    .*? What does it mean?

    . Indicates matching any character

    except newline character

    .* means matching 0 or more arbitrary characters (excluding newlines), always matching as many characters as possible.

    Here? is used to modify *. Added together *? means lazy matching. What does it mean? Just match as few characters as possible. Which of 0 or more is the least? Of course there are 0, so .*? doesn't match anything.

    Entire expression

    <body>([\s\S].*?)<\/body>  // 注意 < 和 > 是不需要转义的

    matches content that contains only any one character or whitespace between <body> and </body>. and

    <body>([\s\S])<\/body>
    The matching content of

    is the same, which means .*? has no effect.

    Why is it OK to just remove .? Because after removing ., lazy matching of *? is used to modify

    [\s\S]

    part, indicating 0 or more whitespace or non-whitespace characters.

    I think you are

    [\s\S] 

    is understood to be used to match newlines. Adding . can match all content. In fact, according to your understanding, it should be written like this

    <body>([\s\S.]*?)<\/body>

    can also be matched in this way, but the . here is redundant because

    [\s\S] 

    matches any content, including the content matched by ..

    So the final answer is

    <body>([\s\S]*?)<\/body>

    matches 0 or more characters between <body> and </body>. So the content can be matched correctly.

    That’s it.

    PS: The layout is a bit messy, because escape characters are difficult to use in the SegmentFault editor

    reply
    0
  • Cancelreply