Home  >  Article  >  Web Front-end  >  How to Match Non-ASCII Characters with Word Boundaries in JavaScript Regex?

How to Match Non-ASCII Characters with Word Boundaries in JavaScript Regex?

Barbara Streisand
Barbara StreisandOriginal
2024-10-27 04:46:29552browse

How to Match Non-ASCII Characters with Word Boundaries in JavaScript Regex?

Matching Non-ASCII Characters in JavaScript Regex with Word Boundaries

In JavaScript, the RegExp object with word boundary (b) matching can encounter limitations when handling non-ASCII characters like Finnish vowels (ä, ö, and å). To accurately match these characters, we need to adjust our approach.

Consider the following code:

<code class="javascript">var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
var searchterm = "äl";

if (new RegExp("\b" + searchterm, "gi").test(title)) {
  // This does not work for "äl"
}</code>

This code attempts to match the term "äl" in the title using the b boundary. However, it fails because b matches word boundaries based on the standard 256-byte range, excluding non-ASCII characters.

Solution: Non-Capturing Group with Word Boundary

To resolve this issue, we can replace b with a non-capturing group that explicitly matches either the beginning of the string or whitespace:

<code class="javascript">if (new RegExp("(?:^|\s)" + searchterm, "gi").test(title)) {
  // Now it works for "äl"
}</code>

Breakdown:

  • (?:...): non-capturing group
  • ^: beginning of the string
  • s: whitespace
  • |: "or" operator

This modified code will match the term "äl" in the title because it defines a more flexible beginning-of-word boundary condition that includes non-ASCII characters.

The above is the detailed content of How to Match Non-ASCII Characters with Word Boundaries in JavaScript Regex?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn