Home >Backend Development >Golang >How to Read Non-UTF-8 Encoded Text Files in Go?

How to Read Non-UTF-8 Encoded Text Files in Go?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-12-01 03:29:13207browse

How to Read Non-UTF-8 Encoded Text Files in Go?

Reading Non-UTF-8 Text Files in Go

In Go, the standard library assumes UTF-8 encoding for all text files. However, this may not be the case for files encoded in other character sets. This article explains how to read non-UTF-8 text files in Go using the golang.org/x/text/encoding package.

The golang.org/x/text/encoding package provides an interface for generic character encodings that can convert to and from UTF-8. For example, the golang.org/x/text/encoding/simplifiedchinese sub-package provides encoders for GB18030, GBK, and HZ-GB2312.

Example: Reading a GBK Encoded File

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"

    "golang.org/x/text/encoding/simplifiedchinese"
    "golang.org/x/text/transform"
)

func main() {
    const filename = "example_GBK_file"

    // Read UTF-8 from a GBK encoded file
    f, err := os.Open(filename)
    if err != nil {
        log.Fatal(err)
    }
    r := transform.NewReader(f, simplifiedchinese.GBK.NewDecoder())

    // Read converted UTF-8 from `r` as needed
    sc := bufio.NewScanner(r)
    for sc.Scan() {
        fmt.Printf("Read line: %s\n", sc.Bytes())
    }
    if err := sc.Err(); err != nil {
        log.Fatal(err)
    }
    if err = f.Close(); err != nil {
        log.Fatal(err)
    }
}

This example uses a transform.NewReader to wrap an os.File object and perform on-the-fly decoding from GBK to UTF-8.

Additional Notes:

  • This approach uses only packages provided by the Go authors, eliminating dependencies on third-party packages or cgo.
  • You can easily swap out the encoding implementation to support other character sets, such as Big5, Windows1252, or EUCKR.
  • Refer to the golang.org/x/text/encoding and golang.org/x/text/encoding/simplifiedchinese packages for more details.

The above is the detailed content of How to Read Non-UTF-8 Encoded Text Files in Go?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn