search
HomeBackend DevelopmentGolangHow to Handle Non-ASCII Characters in Go\'s Regular Expression Boundaries?

 How to Handle Non-ASCII Characters in Go's Regular Expression Boundaries?

Golang Regular Expression Boundary and Non-ASCII Characters

Go's regular expression boundary (b) is designed to match the boundary between ASCII characters and non-ASCII characters. However, in certain scenarios, it may not behave as expected when Latin characters are involved.

The Problem

In Go, the b boundary only works when it surrounds ASCII characters. For instance, the regex b(vis)b is intended to match the word "vis". However, when the word "vis" contains Latin characters, such as "révisé", b fails to recognize it as a word boundary.

Consider the following Go code:

<code class="go">package main

import (
    "fmt"
    "regexp"
)

func main() {
    r, _ := regexp.Compile(`\b(vis)\b`)
    fmt.Println(r.MatchString("re vis e")) // Expected true
    fmt.Println(r.MatchString("revise"))  // Expected true
    fmt.Println(r.MatchString("révisé")) // Expected false
}</code>

Running this code produces:

true
true
true

Notice that the last line incorrectly matches "révisé".

The Solution

To handle cases with non-ASCII characters, you can define your own custom boundary pattern. One approach is to replace b with the following regex:

(?:\A|\s)(vis)(?:\s|\z)

This pattern means:

  • (?:A|s): Matches the start of the string or a whitespace character.
  • (vis): Captures the word "vis".
  • (?:s|z): Matches a whitespace character or the end of the string.

This custom boundary effectively achieves what b does for ASCII characters, but it also extends to non-ASCII characters like Latin characters.

By incorporating this custom pattern into the regex, you can obtain the desired result:

<code class="go">package main

import (
    "fmt"
    "regexp"
)

func main() {
    r, _ := regexp.Compile(`(?:\A|\s)(vis)(?:\s|\z)`)
    fmt.Println(r.MatchString("vis")) // Added this case
    fmt.Println(r.MatchString("re vis e"))
    fmt.Println(r.MatchString("revise"))
    fmt.Println(r.MatchString("révisé"))
}</code>

Running this code now gives:

true
true
false
false

As you can see, "révisé" is correctly excluded as a match.

The above is the detailed content of How to Handle Non-ASCII Characters in Go\'s Regular Expression Boundaries?. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Learn Go String Manipulation: Working with the 'strings' PackageLearn Go String Manipulation: Working with the 'strings' PackageMay 09, 2025 am 12:07 AM

Go's "strings" package provides rich features to make string operation efficient and simple. 1) Use strings.Contains() to check substrings. 2) strings.Split() can be used to parse data, but it should be used with caution to avoid performance problems. 3) strings.Join() is suitable for formatting strings, but for small datasets, looping = is more efficient. 4) For large strings, it is more efficient to build strings using strings.Builder.

Go: String Manipulation with the Standard 'strings' PackageGo: String Manipulation with the Standard 'strings' PackageMay 09, 2025 am 12:07 AM

Go uses the "strings" package for string operations. 1) Use strings.Join function to splice strings. 2) Use the strings.Contains function to find substrings. 3) Use the strings.Replace function to replace strings. These functions are efficient and easy to use and are suitable for various string processing tasks.

Mastering Byte Slice Manipulation with Go's 'bytes' Package: A Practical GuideMastering Byte Slice Manipulation with Go's 'bytes' Package: A Practical GuideMay 09, 2025 am 12:02 AM

ThebytespackageinGoisessentialforefficientbyteslicemanipulation,offeringfunctionslikeContains,Index,andReplaceforsearchingandmodifyingbinarydata.Itenhancesperformanceandcodereadability,makingitavitaltoolforhandlingbinarydata,networkprotocols,andfileI

Learn Go Binary Encoding/Decoding: Working with the 'encoding/binary' PackageLearn Go Binary Encoding/Decoding: Working with the 'encoding/binary' PackageMay 08, 2025 am 12:13 AM

Go uses the "encoding/binary" package for binary encoding and decoding. 1) This package provides binary.Write and binary.Read functions for writing and reading data. 2) Pay attention to choosing the correct endian (such as BigEndian or LittleEndian). 3) Data alignment and error handling are also key to ensure the correctness and performance of the data.

Go: Byte Slice Manipulation with the Standard 'bytes' PackageGo: Byte Slice Manipulation with the Standard 'bytes' PackageMay 08, 2025 am 12:09 AM

The"bytes"packageinGooffersefficientfunctionsformanipulatingbyteslices.1)Usebytes.Joinforconcatenatingslices,2)bytes.Bufferforincrementalwriting,3)bytes.Indexorbytes.IndexByteforsearching,4)bytes.Readerforreadinginchunks,and5)bytes.SplitNor

Go encoding/binary package: Optimizing performance for binary operationsGo encoding/binary package: Optimizing performance for binary operationsMay 08, 2025 am 12:06 AM

Theencoding/binarypackageinGoiseffectiveforoptimizingbinaryoperationsduetoitssupportforendiannessandefficientdatahandling.Toenhanceperformance:1)Usebinary.NativeEndianfornativeendiannesstoavoidbyteswapping.2)BatchReadandWriteoperationstoreduceI/Oover

Go bytes package: short reference and tipsGo bytes package: short reference and tipsMay 08, 2025 am 12:05 AM

Go's bytes package is mainly used to efficiently process byte slices. 1) Using bytes.Buffer can efficiently perform string splicing to avoid unnecessary memory allocation. 2) The bytes.Equal function is used to quickly compare byte slices. 3) The bytes.Index, bytes.Split and bytes.ReplaceAll functions can be used to search and manipulate byte slices, but performance issues need to be paid attention to.

Go bytes package: practical examples for byte slice manipulationGo bytes package: practical examples for byte slice manipulationMay 08, 2025 am 12:01 AM

The byte package provides a variety of functions to efficiently process byte slices. 1) Use bytes.Contains to check the byte sequence. 2) Use bytes.Split to split byte slices. 3) Replace the byte sequence bytes.Replace. 4) Use bytes.Join to connect multiple byte slices. 5) Use bytes.Buffer to build data. 6) Combined bytes.Map for error processing and data verification.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor