Home >Web Front-end >JS Tutorial >What is single-line mode? Detailed explanation of JavaScript regular single-line mode

What is single-line mode? Detailed explanation of JavaScript regular single-line mode

零下一度
零下一度Original
2017-04-21 10:20:351503browse

This article mainly introduces JavaScript regular expressions which also have single-line mode. Friends who need it can refer to it

Regular expressions were first implemented by Ken Thompson in his improved QED editor in 1970. , the simplest metacharacter in the regular expression "." at that time matched any character except the newline character:

"." is a regular expression which matches any character except 5563c1593a3ac6eb1677af49676ec1ed.

The above sentence comes from QED's official document in 1970, which may be the first regular document in history.

Why is this stipulated? This is because QED edits files in line units, and the newline character at the end of the line is also included in the content of this line. For example, if you want to delete all single-line comments in a piece of code, you can use the following command in QED:

1,$s#//.*##

If "." can match the newline character, then the newline character will also be deleted, and it will Causes these lines to be merged with the next line, which is usually not what we want. Therefore, "." was designed not to match newlines when it was originally invented. Although there is no QED command on the current operating system for us to test, we still have VIM, and the "." in VIM cannot match the newline character for the same reason.

Unlike in Node, where reading a file usually involves reading the entire file in one go, Perl inherits the tradition of many Linux commands reading files line by line, like this:

while (a8093152e673feb7aba1828c43532094) {print $_}

_ The end of There are also newlines, so Perl naturally inherits QED's rule that "." does not match newlines. But Perl is a programming language after all, not an editor. The objects that its regular expressions need to match are not only single lines of text, but may also be multi-line texts. Therefore, in its regular expressions, "." has a requirement for cross-line matching. Therefore, Perl invented the regular single-line mode /s, which allows "." to also match newline characters.

The official description of the /s modifier in Perl used to turn on single line mode is "Treat the string as single line". This "single line" should be understood like this: "." can only match in normal mode. Inline characters cannot span lines; in single-line mode, Perl will pretend to treat multi-line strings as one line, and treat the newline characters as inline characters, so "." can match them. To put it more vividly, the following three lines of text

1
2
3

are regarded as "1\n2\n3\n" one line of text. This is what single-line mode means.

But the terrible thing is that for the same reason (string variables can contain multiple lines of text), Perl also invented the /m modifier, which is multi-line mode. The official description is "Treat the string as multiple lines ", this pattern has been included in the regular JavaScript rules since ancient times. The "multiple lines" here means: ^ and $ metacharacters will not match the positions before and after the newline characters in the middle of a string by default, that is, the string is always considered to be only one line. , you can match after turning on multi-line mode.

In other words, single-line mode and multi-line mode are for different metacharacters. People who are new to regular expressions will be confused by the two seemingly corresponding "single-line mode" and "multi-line mode". concept, but in fact, it is confusing with unrelated terms.

Later, the author of Ruby may have felt that the regular term "single-line mode" was not used well, so he called the pattern of "." matching newlines "multi-line mode", that is, let . * and other regular expressions can match multiple lines, so it makes perfect sense. The modifier also uses /m (Ruby will enable the "multiline mode" in Perl by default, so /m is not occupied). This is really To add insult to injury, it’s even more chaotic.

Later, the Python author may also feel that the term "single-line mode" should be avoided, so he gave a new name "dotall", which means that dot can match all characters. It is a good name. , and later Java also used this name.

The above has reviewed the history, explained the origin of the single-line mode, and explained that the name of the single-line mode was not chosen well. V8 has recently implemented a stage 3 ES proposal github.com/mathiasbynens/es-regexp-dotall-flag. This proposal introduces the /s modifier and dotAll attribute to JavaScript regularity. The dotAll attribute is learned from Python and Java. The /s modifier is inherited from Perl. There is no need to invent a new modifier such as /d here, which will only make things more complicated. The specific effect of /s in JavaScript is to allow "." to match four line terminators that could not be matched before: \n (line feed), \r (carriage return), \u2028 (line separator), \u2029 (paragraph separator) symbol):


/foo/s.dotAll // true
/^.{4}$/s.test("\n\r\u2028\u2029") // true

is actually a very simple thing, but some students who have not been exposed to regular expressions other than JavaScript may have problems when they learn this new pattern. Confused, let me clarify again: multi-line mode controls the performance of ^ and $, and single-line mode controls the performance of ".". There is no direct relationship between the two.

However, the Perl language, which originally introduced the confusing concepts of single-line mode and multi-line mode, has completely deleted these two modes in Perl 6: "." matches the newline character by default, \N Can match any character except newline; ^ and $ always match the beginning and end of the string, while the two new metacharacters ^^ and $$ are introduced to match the beginning and end of the line.

The single-line mode alternatives [^] or [\s\S] that we commonly used in the past are not completely useless. For example, in some editors that use JavaScript regularity (VS Code, Atom), it is unlikely to give You provide an interface to enable single-line mode. However, talking about the regular function in the editor, the regular function of the editor implemented in JavaScript is still too weak. For example, certain modes cannot be turned on within the regular code itself. For example, if it is in Sublime (using Python regular code), inside the regular code Use (?s) to enable dotall mode. For example, you can use (?s)/\*.+?\*/ to match all multi-line comments.

The above is the detailed content of What is single-line mode? Detailed explanation of JavaScript regular single-line mode. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn