Home >Backend Development >Golang >Why Does Go\'s Regex \\b Boundary Fail with Non-ASCII Characters?

Why Does Go\'s Regex \\b Boundary Fail with Non-ASCII Characters?

DDD
DDDOriginal
2024-10-29 00:26:02878browse

Why Does Go's Regex \b Boundary Fail with Non-ASCII Characters?

Golang Regex Boundary Issue with Non-ASCII Characters

In Go, the b boundary option is expected to match at the boundary of ASCII characters, excluding accented characters such as é. This behavior can lead to unexpected results when working with strings containing non-ASCII characters. For instance, consider the following code:

<code class="go">package main

import (
    "fmt"
    "regexp"
)

func main() {
    r, _ := regexp.Compile(`\b(vis)\b`)
    fmt.Println(r.MatchString("re vis e")) // True
    fmt.Println(r.MatchString("revise")) // False
    fmt.Println(r.MatchString("révisé")) // True
}</code>

In this example, the b(vis)b regex matches the substring "vis" at word boundaries. However, when applied to "révisé", it incorrectly returns True because é is not considered a word character. To address this issue, you can employ an alternative approach:

<code class="go">r, _ := regexp.Compile(`(?:\A|\s)(vis)(?:\s|\z)`)
fmt.Println(r.MatchString("vis")) // True
fmt.Println(r.MatchString("re vis e")) // True
fmt.Println(r.MatchString("revise")) // False
fmt.Println(r.MatchString("révisé")) // False</code>

This solution utilizes a non-capturing group (?:A|s)(vis)(?:s|z) to match any of the following characters:

  • Start of string (A)
  • Whitespace (s)

This mimics the behavior of b but includes non-ASCII characters as potential word boundaries. By combining these components, it successfully matches "vis" at the beginning or end of a word, regardless of the surrounding characters.

The above is the detailed content of Why Does Go\'s Regex \\b Boundary Fail with Non-ASCII Characters?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn