Home > Article > Backend Development > javascript - A question about regular expressions
Question: There is a string: "python php ruby javascript jsonp perhapsphpisoutdated"
For this string, use pure regular expression to get all words with p but not ph
Output array [ 'python', 'javascript', 'jsonp' ]
I have been thinking about this problem for a long time and I have no idea
My solution is
<code>var result = str.match(/\b\w*(?=p)\w*\b/g) .filter((value)=>!/.*(?=ph)/.test(value)) var result2 = str.match( /\b((?!ph|\s).)*((p[^h\s]((?!ph|\s).)*)|p)\b/g ) console.log(result2)</code>
But it does not meet the requirements of pure regularity
A big guy in the group gave this answer
<code>/\b((?!ph|\s).)*((p[^h\s]((?!ph|\s).)*)|p)\b/g </code>
Works perfectly
But I can’t understand it, I hope someone can help me understand it
Question: There is a string: "python php ruby javascript jsonp perhapsphpisoutdated"
For this string, use pure regular expression to get all words with p but not ph
Output array [ 'python', 'javascript', 'jsonp' ]
I have been thinking about this problem for a long time and I have no idea
My solution is
<code>var result = str.match(/\b\w*(?=p)\w*\b/g) .filter((value)=>!/.*(?=ph)/.test(value)) var result2 = str.match( /\b((?!ph|\s).)*((p[^h\s]((?!ph|\s).)*)|p)\b/g ) console.log(result2)</code>
But it does not meet the requirements of pure regularity
A big guy in the group gave this answer
<code>/\b((?!ph|\s).)*((p[^h\s]((?!ph|\s).)*)|p)\b/g </code>
Works perfectly
But I can’t understand it, I hope someone can help me understand it
<code>var str = 'python php ruby javascript jsonp perhapsphpisoutdated'; var reg = /\b(\w*(p[^h\s](?!ph))\w*)\b/g; str.match(reg); // => ["python", "javascript", "perhapsphpisoutdated"]</code>
b
is the boundary character, and the range is the characters betweenw
andW
.
()
identifies a subexpression.
(?!)
identifies a reverse lookahead assertion. Unlike subexpressions, lookahead assertions will not be recorded.
[^]
identifies the set that does not meet the conditions
So the above regular meaning is Get the string between the boundaries that contains "p" but the string immediately following it is not "h" or "space", and it does not contain "ph" either
b((?!ph|s).)*((p[^hs]((?!ph|s).)*)|p)b/g
b
is a boundary character
So the corresponding match for each word is:((?!ph|s).)*((p[^hs]((?!ph|s).)*)|p)
Split this expression into three parts:
((?!ph|s).)*
(p[^hs]((?!ph|s).)*)
p
The most common expression is ((?!ph|s).)*
Let’s analyze it
"The Definitive Guide to JavaScript" says:
(?!p)
is a zero-width negative lookahead assertion, which means that the following characters do not matchp
The zero width here means that it does not occupy the matching :
This may be difficult to understand, for example:
Calculate the value of
"1234".match(/((?!34).)*/)
For the first time (?!34)
there is nothing before, ignore it, only match .
, match "1"
, the remaining string "234"
Match "234"
and test whether "23"
matches ?!
in "34"
, the result does not match, continue, "23"
is not consumed, next .
matched "2"
,
matches "34"
, because "34"
matches ?!
in "34"
, the matching is terminated
The matching result of the entire expression is "12"
Conclusion: The string matched by the expression in the form of
/((?!p).)*/
is the part beforep
Let’s look at the previous three expressions here:
((?!ph|s).)*
(p[^hs]((?!ph|s).)*)
p
The first expression means matching the longest possible characters before "ph"
or spaces in the word
The second expression matches the characters "p"
and after in the word. It is required that the first character after "p"
cannot be "h"
, and it is also required not to match "ph"
The third expression matches a single "p"
character, because the shortest matching form in the previous match is p[^hs]
, which is at least two characters, and a single "p" character poster Required but not included, so matched individually
After sorting it out, you will find that none of the three matching expressions above match "ph"
, but there must be "p"
among them, which fully meets the requirements of the topic