Home >Backend Development >Golang >How Can I Correctly Read UTF-16 Text Files in Go, Handling Both BOM and Non-BOM Encodings?

How Can I Correctly Read UTF-16 Text Files in Go, Handling Both BOM and Non-BOM Encodings?

Susan Sarandon
Susan SarandonOriginal
2024-12-27 15:32:11345browse

How Can I Correctly Read UTF-16 Text Files in Go, Handling Both BOM and Non-BOM Encodings?

Reading UTF-16 Text File as a String in Go

When reading UTF-16 text files in Go, you may encounter issues with bytes being interpreted as ASCII. This arises because the standard bufio.NewReader function doesn't handle unicode correctly.

Solution

UTF-16 with BOM

The latest version of "golang.org/x/text/encoding/unicode" introduces unicode.BOMOverride, which automatically detects and interprets the BOM to decode UTF-16 correctly. Here's an example using ReadFileUTF16():

func ReadFileUTF16(filename string) ([]byte, error) {
    raw, err := ioutil.ReadFile(filename)
    if err != nil {
        return nil, err
    }
    win16be := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
    utf16bom := unicode.BOMOverride(win16be.NewDecoder())
    unicodeReader := transform.NewReader(bytes.NewReader(raw), utf16bom)
    decoded, err := ioutil.ReadAll(unicodeReader)
    return decoded, err
}

This function will decode UTF-16 files with a BOM.

UTF-16 without BOM

If your file does not contain a BOM, you can use the following code:

func ReadFileUTF16WithoutBOM(filename string) ([]byte, error) {
    f, err := os.Open(filename)
    if err != nil {
        return nil, err
    }
    r := bufio.NewReader(f)

    // Read past the BOM, if any.
    var b, e = r.Peek(2)
    if (b[0] == 0xFF && b[1] == 0xFE) || (b[0] == 0xFE && b[1] == 0xFF) {
        r.Discard(2)
    }

    // Read the rest of the file.
    decoded, err := ioutil.ReadAll(r)
    return decoded, err
}

This function will skip any BOM and read the file as UTF-16.

Conclusion

By using ReadFileUTF16() or ReadFileUTF16WithoutBOM(), you can handle both BOM and non-BOM UTF-16 text files in Go, ensuring accurate decoding and representation of your data.

The above is the detailed content of How Can I Correctly Read UTF-16 Text Files in Go, Handling Both BOM and Non-BOM Encodings?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn