Home >Backend Development >Golang >How to Correctly Index Characters in Go's UTF-8 Strings?

How to Correctly Index Characters in Go's UTF-8 Strings?

Patricia Arquette
Patricia ArquetteOriginal
2024-12-17 18:17:10126browse

How to Correctly Index Characters in Go's UTF-8 Strings?

Character Indexing in Golang Strings: UTF-8 Decoding

Although Golang's string literals are character sequences encoded in UTF-8, indexing individual characters by their position using the array-like syntax string[index] can yield unexpected results. This is because UTF-8 characters can occupy multiple bytes, and indexing by byte position can break the UTF-8 encoding.

To index and retrieve individual characters accurately, Golang offers several options:

Using Unicode Code Points (Runes)

  • string([]rune("string")[index]): Convert the string to a slice of runes, where each rune represents a Unicode code point. This approach ensures consistent character indexing regardless of UTF-8 byte position.

Example:

package main

import "fmt"

func main() {
    fmt.Println(string([]rune("HELLO, 世界")[1])) // "E"
    fmt.Println(string([]rune("HELLO, 世界")[8])) // "界"
}

Converting Bytes to Characters

  • string("byte"): Convert a single-byte ASCII character to a string. Note that this is only applicable to ASCII characters (the first 128 Unicode characters).

Example:

package main

import "fmt"

func main() {
    fmt.Println(string("HELLO"[1])) // "e"
}

The above is the detailed content of How to Correctly Index Characters in Go's UTF-8 Strings?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn