Home >Backend Development >Golang >How to Handle Non-UTF-8 Encoded XML in Go?

How to Handle Non-UTF-8 Encoded XML in Go?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-12-26 03:28:151014browse

How to Handle Non-UTF-8 Encoded XML in Go?

Handling Non-UTF-8 XML Input in Go

When attempting to unmarshal an XML input using the Unmarshal function in Go's xml package, one might encounter issues if the input is not encoded in UTF-8. To address this, a CharsetReader is required.

Where to Find a CharsetReader

Fortunately, Go's net/html package provides a solution in the form of charset.NewReaderLabel. This reader can handle the conversion of non-UTF-8 encoded input to UTF-8.

Updated Solution for 2015 and Beyond

In earlier versions of Go, a custom CharsetReader had to be implemented. However, newer versions of Go provide a simpler solution using charset.NewReaderLabel. Here's an updated code snippet:

import (
    "encoding/xml"
    "bytes"
    "golang.org/x/net/html/charset"
)

// ...
reader := bytes.NewReader(theXml)
decoder := xml.NewDecoder(reader)
decoder.CharsetReader = charset.NewReaderLabel
err = decoder.Decode(&parsed)

By using charset.NewReaderLabel as the CharsetReader, the Unmarshal function can now successfully handle non-UTF-8 encoded XML input without manual conversion or custom implementations.

The above is the detailed content of How to Handle Non-UTF-8 Encoded XML in Go?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn