How can I make this regex simpler?

Question

P粉663883862 · Answer

Simply iterate over all strings and filter out all strings that do not contain all keywords:

(A more concise version can be found in the code snippet below)

function findMatch(strings, keywords) {
  const result = [];
  
  for (const string of strings) {
    if (keywords.every(keyword => string.includes(keyword))) {
      result.push(string);
    }
  }
  
  return result;
}

try it:

console.config({ maximize: true });

function findMatch(strings, keywords) {
  return strings.filter(
    string => keywords.every(keyword => string.includes(keyword))
  );
}

const testcases = [
  'WORD1WORD2WORD3',
  'WORD1AWORD2BWORD3C',
  'WORD3WORD1WORD2',
  'WORD1WORD2WORD3WORD1',
  'WORD1WORD1WORD2',
  'WORD1AWORD1BWORD2C'
];

const keywords = [
  'WORD1', 'WORD2', 'WORD3'
];

console.log(findMatch(testcases, keywords));

P粉998100648 · Answer

You can use positive lookahead for each word.

/(?=.*WORD1)(?=.*WORD2)(?=.*WORD3).*/

A more performant version below specifies the starting anchor and only matches a single character after validating the lookahead. As requested by the OP, this technique only works with matching, not extraction.

/^(?=.*WORD1)(?=.*WORD2)(?=.*WORD3)./

Forward lookahead is like a gate, it will only continue if the match specified within the brackets exists, but it will not consume or capture what it matches - it is always zero length. If you "look ahead" to see if there is .* before each word, the order of the words doesn't matter. If each word is true, proceed without using anything for matching.

If you only care about whether the content matches, the only substantial difference between the two expressions is the time they take. Let’s say you only have 2 of the 3 required words in your content. Unless the software interpreting the expression can recognize that the attempt is futile, it might look for the three words "failed" in the first position, then try "failed" in the second position, and so on until it reaches the last position. give up. By specifying ^, only the first position will be checked, saving time on other unnecessary checks. Removing the * from the end can prevent some unnecessary catches when you are just looking for the true/false answer of whether all words are present in the content.

How can I make this regex simpler?

reply all(2)I'll reply