Home >Backend Development >Golang >How to Read and Write Non-UTF-8 Encoded Text Files in Go?

How to Read and Write Non-UTF-8 Encoded Text Files in Go?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-12-04 20:40:12950browse

How to Read and Write Non-UTF-8 Encoded Text Files in Go?

Reading and Writing Non-UTF-8 Text Files in Go

Background

By default, the standard library in Go assumes that text files are encoded in UTF-8. However, there are scenarios where you may encounter text files encoded in different encodings, such as GBK.

Solution

To read and write non-UTF-8 text files in Go, you can use the following steps:

Reading Non-UTF-8 Files

  1. Import the necessary package: import "golang.org/x/text/encoding/simplifiedchinese"
    This package provides GB18030, GBK, and HZ-GB2312 encoding implementations.
  2. Create an io.Reader using transform.NewReader:

    f, err := os.Open(filename)
    if err != nil {
        log.Fatal(err)
    }
    r := transform.NewReader(f, simplifiedchinese.GBK.NewDecoder())

Writing Non-UTF-8 Files

  1. Import the same package: import "golang.org/x/text/encoding/simplifiedchinese"
  2. Create an io.Writer using transform.NewWriter:

    f, err := os.Create(filename)
    if err != nil {
        log.Fatal(err)
    }
    w := transform.NewWriter(f, simplifiedchinese.GBK.NewEncoder())

Example

The following example shows how to read and write a GBK-encoded text file:

import (
    "bufio"
    "fmt"
    "log"
    "os"

    "golang.org/x/text/encoding/simplifiedchinese"
    "golang.org/x/text/transform"
)

func main() {
    const filename = "example_GBK_file"
    exampleWriteGBK(filename)
    exampleReadGBK(filename)
}

func exampleReadGBK(filename string) {
    f, err := os.Open(filename)
    if err != nil {
        log.Fatal(err)
    }

    r := transform.NewReader(f, simplifiedchinese.GBK.NewDecoder())

    sc := bufio.NewScanner(r)
    for sc.Scan() {
        fmt.Printf("Read line: %s\n", sc.Bytes())
    }
    if err := sc.Err(); err != nil {
        log.Fatal(err)
    }

    if err := f.Close(); err != nil {
        log.Fatal(err)
    }
}

func exampleWriteGBK(filename string) {
    f, err := os.Create(filename)
    if err != nil {
        log.Fatal(err)
    }

    w := transform.NewWriter(f, simplifiedchinese.GBK.NewEncoder())

    // Write some text from the Wikipedia GBK page that includes Chinese
    _, err = fmt.Fprintln(w,
        `In 1995, China National Information Technology Standardization
Technical Committee set down the Chinese Internal Code Specification
(Chinese: 汉字内码扩展规范(GBK); pinyin: Hànzì Nèimǎ
Kuòzhǎn Guīfàn (GBK)), Version 1.0, known as GBK 1.0, which is a
slight extension of Codepage 936. The newly added 95 characters were not
found in GB 13000.1-1993, and were provisionally assigned Unicode PUA
code points.`)
    if err != nil {
        log.Fatal(err)
    }

    if err := f.Close(); err != nil {
        log.Fatal(err)
    }
}

This code opens a GBK-encoded text file, reads its contents, and writes them to another GBK-encoded text file.

The above is the detailed content of How to Read and Write Non-UTF-8 Encoded Text Files in Go?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn