Home  >  Article  >  Backend Development  >  javascript - A question about regular expressions

javascript - A question about regular expressions

WBOY
WBOYOriginal
2016-08-04 09:18:591051browse

Question: There is a string: "python php ruby ​​javascript jsonp perhapsphpisoutdated"
For this string, use pure regular expression to get all words with p but not ph

Output array [ 'python', 'javascript', 'jsonp' ]

I have been thinking about this problem for a long time and I have no idea
My solution is

<code>var result = str.match(/\b\w*(?=p)\w*\b/g)
                .filter((value)=>!/.*(?=ph)/.test(value))
var result2 = str.match(  /\b((?!ph|\s).)*((p[^h\s]((?!ph|\s).)*)|p)\b/g  ) 
console.log(result2)</code>

But it does not meet the requirements of pure regularity

A big guy in the group gave this answer

<code>/\b((?!ph|\s).)*((p[^h\s]((?!ph|\s).)*)|p)\b/g </code>

Works perfectly

But I can’t understand it, I hope someone can help me understand it

Reply content:

Question: There is a string: "python php ruby ​​javascript jsonp perhapsphpisoutdated"
For this string, use pure regular expression to get all words with p but not ph

Output array [ 'python', 'javascript', 'jsonp' ]

I have been thinking about this problem for a long time and I have no idea
My solution is

<code>var result = str.match(/\b\w*(?=p)\w*\b/g)
                .filter((value)=>!/.*(?=ph)/.test(value))
var result2 = str.match(  /\b((?!ph|\s).)*((p[^h\s]((?!ph|\s).)*)|p)\b/g  ) 
console.log(result2)</code>

But it does not meet the requirements of pure regularity

A big guy in the group gave this answer

<code>/\b((?!ph|\s).)*((p[^h\s]((?!ph|\s).)*)|p)\b/g </code>

Works perfectly

But I can’t understand it, I hope someone can help me understand it

<code>var str = 'python php ruby javascript jsonp perhapsphpisoutdated';
var reg = /\b(\w*(p[^h\s](?!ph))\w*)\b/g;
str.match(reg);
// => ["python", "javascript", "perhapsphpisoutdated"]</code>
  • b is the boundary character, and the range is the characters between w and W.

  • () identifies a subexpression.

  • (?!) identifies a reverse lookahead assertion. Unlike subexpressions, lookahead assertions will not be recorded.

  • [^] identifies the set that does not meet the conditions

So the above regular meaning is Get the string between the boundaries that contains "p" but the string immediately following it is not "h" or "space", and it does not contain "ph" either

b((?!ph|s).)*((p[^hs]((?!ph|s).)*)|p)b/g
b is a boundary character

So the corresponding match for each word is:
((?!ph|s).)*((p[^hs]((?!ph|s).)*)|p)

Split this expression into three parts:

  1. ((?!ph|s).)*

  2. (p[^hs]((?!ph|s).)*)

  3. p

The most common expression is ((?!ph|s).)* Let’s analyze it

"The Definitive Guide to JavaScript" says: (?!p) is a zero-width negative lookahead assertion, which means that the following characters do not match p

The zero width here means that it does not occupy the matching :
This may be difficult to understand, for example:

Calculate the value of "1234".match(/((?!34).)*/)

  1. For the first time (?!34)there is nothing before, ignore it, only match ., match "1", the remaining string "234"

  2. Match "234" and test whether "23" matches ?! in "34", the result does not match, continue, "23" is not consumed, next .matched "2",

  3. matches "34", because "34" matches ?! in "34", the matching is terminated

The matching result of the entire expression is "12"

Conclusion: The string matched by the expression in the form of /((?!p).)*/ is the part before p

Let’s look at the previous three expressions here:

  1. ((?!ph|s).)*

  2. (p[^hs]((?!ph|s).)*)

  3. p

  • The first expression means matching the longest possible characters before "ph" or spaces in the word

  • The second expression matches the characters "p" and after in the word. It is required that the first character after "p" cannot be "h", and it is also required not to match "ph"

  • The third expression matches a single "p" character, because the shortest matching form in the previous match is p[^hs], which is at least two characters, and a single "p" character poster Required but not included, so matched individually

After sorting it out, you will find that none of the three matching expressions above match "ph", but there must be "p" among them, which fully meets the requirements of the topic

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn