Home >Backend Development >Golang >golang zip Chinese garbled code
With the continuous popularity and application of Golang in the field of web development, Zip compressed files have become an essential functional module. However, some developers encountered a common problem when using Golang's Zip package for file compression - Chinese file names appear garbled.
This is a very troublesome problem, because it not only makes the originally beautiful file name look nondescript, but may also cause a series of other errors. Below we will explore the causes and solutions to this problem.
Zip format is a binary format that contains the file name, file directory, compression method and compressed data. Among this information, the file name is a very critical part, because it determines the name and storage path of the file after the user decompresses it.
However, different file systems and encoding formats may result in different parsing results for the same file name. For example, on Windows systems, the default encoding used for file names is GBK, while on UNIX/Linux systems, the file names use UTF-8. If we do not handle these different encoding methods when using the Zip package in Golang for compression, the file name may be treated as garbled characters.
We have a variety of solutions for the problem of garbled Chinese file names. Below we will introduce some of the more feasible methods.
Because the Zip format uses CP437 encoding to represent the file name, we can convert the file name from GB18030 encoding to CP437 encoding, so as to ensure that File names can be correctly parsed in any environment. In Go, you can use the golang.org/x/text/encoding/simplifiedchinese
package to convert between GB18030 encoding and CP437 encoding.
import ( "golang.org/x/text/encoding/simplifiedchinese" "golang.org/x/text/transform" ) func GbkToUtf8(data []byte) ([]byte, error) { return transform.NewReader(bytes.NewReader(data), simplifiedchinese.GB18030.NewDecoder()).ReadAll() } func Utf8ToGbk(data []byte) ([]byte, error) { return transform.NewReader(bytes.NewReader(data), simplifiedchinese.GB18030.NewEncoder()).ReadAll() }
We can also use the StructTag
option in the reflect
package to force the use of the specified encoding method. Specifically, you can add a zip
tag to the structure and a chinese-utf8
tag to the Tag. The sample code is as follows:
type File struct { Name string `zip:"filename=测试文件,chinese-utf8"` } func main() { zhName := "测试文件" utf8Name, _ := GbkToUtf8([]byte(zhName)) f := &File{Name: string(utf8Name)} // 压缩文件... }
In the Golang Zip package, we can also call fileheader
and ##name
attribute of #FileInfo to manually specify the correct encoding for each file.
import "archive/zip" func zipFiles(filePaths []string, dest string) error { // 创建文件 newZipFile, err := os.Create(dest) if err != nil { return err } defer newZipFile.Close() // 创建 ZIP writer 对象 zipWriter := zip.NewWriter(newZipFile) defer zipWriter.Close() // 遍历 filePaths,为每个文件设置正确的编码方式 for _, filePath := range filePaths { zipFile, err := os.Open(filePath) if err != nil { return err } defer zipFile.Close() // 解析文件名,并转换编码 zipFileInfo, _ := zip.FileInfoHeader(zipFile.Stat()) zipFileInfo.Name, _ = GbkToUtf8([]byte(zipFileInfo.Name)) // 创建 Zip 文件写入器 zipWriterNewFile, err := zipWriter.CreateHeader(zipFileInfo) if err != nil { return err } // 读取文件并写入 Zip 文件中 _, err = io.Copy(zipWriterNewFile, zipFile) if err != nil { return err } } return nil }ConclusionGolang’s Zip package provides a variety of solutions to the problem of garbled Chinese file names. If the encoding format of the file name is not uniform, it is easy for the file name to be garbled. We can easily avoid this problem by simply choosing the right solution according to our needs.
The above is the detailed content of golang zip Chinese garbled code. For more information, please follow other related articles on the PHP Chinese website!