Home >Backend Development >Golang >How to Handle Non-UTF-8 Encoded XML in Go?
Handling Non-UTF-8 XML Input in Go
When attempting to unmarshal an XML input using the Unmarshal function in Go's xml package, one might encounter issues if the input is not encoded in UTF-8. To address this, a CharsetReader is required.
Where to Find a CharsetReader
Fortunately, Go's net/html package provides a solution in the form of charset.NewReaderLabel. This reader can handle the conversion of non-UTF-8 encoded input to UTF-8.
Updated Solution for 2015 and Beyond
In earlier versions of Go, a custom CharsetReader had to be implemented. However, newer versions of Go provide a simpler solution using charset.NewReaderLabel. Here's an updated code snippet:
import ( "encoding/xml" "bytes" "golang.org/x/net/html/charset" ) // ... reader := bytes.NewReader(theXml) decoder := xml.NewDecoder(reader) decoder.CharsetReader = charset.NewReaderLabel err = decoder.Decode(&parsed)
By using charset.NewReaderLabel as the CharsetReader, the Unmarshal function can now successfully handle non-UTF-8 encoded XML input without manual conversion or custom implementations.
The above is the detailed content of How to Handle Non-UTF-8 Encoded XML in Go?. For more information, please follow other related articles on the PHP Chinese website!