Home > Article > Backend Development > golang csv parsing garbled characters
When using Golang to parse csv files, sometimes you will encounter the problem of garbled characters. This situation is very common, but it is also very troublesome. So, how to solve this problem?
First we must understand that csv is a text file format, using "," to separate each field. When the text data in the CSV file contains non-ASCII characters, garbled characters will occur. The cause of this problem is actually related to encoding. It is usually caused by the inconsistency between the encoding format of the csv file and the encoding format used during parsing.
In golang, the commonly used csv library is the built-in encoding/csv. This library uses UTF-8 encoding by default to parse csv files. If you want to process csv files in other encoding formats, additional processing is required.
There are several methods to solve the problem of garbled characters. We will introduce them one by one below:
Method 1. Manual conversion of encoding format
Before parsing csv, we can manually convert The encoding format of the csv file is converted to UTF-8. The easiest way is to use Notepad to open the csv file and save it to UTF-8 format.
Manual conversion may be troublesome, especially when we have a large number of csv files. Therefore, we can try the second method.
Method 2. Use a third-party library
The common csv parsing library in Golang is encoding/csv. If we need to process csv files in other encoding formats, we need to use a third-party library to assist. parse. For example, you can use gocsv to parse csv files in gbk encoding format.
Gocsv installation method:
$ go get github.com/kuangyh/csv
Next, you can use gocsv to parse the csv file like this:
package main import ( "encoding/csv" "fmt" "github.com/kuangyh/csv" "os" ) func main() { file, err := os.Open("example.csv") if err != nil { fmt.Println("Error:", err) return } defer file.Close() reader := csv.NewReader(gocsv.NewReader(file)) reader.Comma = ',' lines, err := reader.ReadAll() if err != nil { fmt.Println("Error:", err) return } for i, line := range lines { fmt.Printf("Line %d: %v ", i+1, line) } }
In the above code, we first import the gocsv library, then use gocsv to create a new reader, pass it into the encoding/csv library, and set the delimiter to ",". Finally, use the ReadAll method to get all the lines in the file and print the output.
Although this method is effective, it also has some problems. For example, we need to use a third-party library to complete the conversion, which will increase dependencies and complexity. If we don't want to use third-party libraries, there is a third method.
Method 3. Manual parsing
The process of manual parsing may be cumbersome, but it is also an effective solution. The key is to understand the format of the csv file.
Usually we add a file header to the first line of the csv file, which contains the name of each field. This file header is also part of the csv file and can be obtained by parsing the first line. In the data row, the data of each row is composed of multiple fields, and these fields are separated by ",". If there is no garbled code problem, then we can use the encoding/csv library to directly parse the csv file. But if garbled characters occur, you need to manually parse each field and convert them into UTF-8 format.
The following is a manual parsing code:
package main import ( "bufio" "encoding/csv" "fmt" "io" "os" ) func main() { file, err := os.Open("example.csv") if err != nil { fmt.Println("Error:", err) } defer file.Close() reader := bufio.NewReader(file) var lines [][]string for { line, err := reader.ReadString(' ') if err != nil && err != io.EOF { fmt.Println("Error:", err) return } if line == "" { break } // 去除换行符 line = line[:len(line)-2] r := csv.NewReader([]byte(line)) r.Comma = ',' fields, err := r.Read() if err != nil { fmt.Println("Error:", err) return } // 将字段转换为UTF-8 for i, s := range fields { fields[i] = transform(s) } lines = append(lines, fields) } for i, line := range lines { fmt.Printf("Line %d: %v ", i+1, line) } } // 将单个字段转换为UTF-8 func transform(s string) string { data, err := ioutil.ReadAll(transform.NewReader(strings.NewReader(s), simplifiedchinese.GBK.NewDecoder())) if err != nil { return s } return string(data) }
In the above code, we first read each line of the csv file through bufio, and then use the encoding/csv library to parse the data of each line . In order to solve the garbled problem, we use the function transform() to convert each field into UTF-8 format.
This function receives a string parameter, first converts it to Reader, then uses simplifiedchinese.GBK.NewDecoder() to create a decoder, and finally uses the ioutil.ReadAll() function to convert the encoded string into UTF-8.
In this way, we can manually parse the csv file and convert it to UTF-8 encoding format.
Summary:
The above are three methods to solve the problem of golang csv parsing garbled characters. If the csv file you are using is UTF-8 encoded, it can be easily parsed using golang's own encoding/csv. Otherwise, you can choose to manually parse or use a third-party library for conversion according to actual needs. In any case, as long as you master the correct method, the problem of garbled characters is no longer a problem.
The above is the detailed content of golang csv parsing garbled characters. For more information, please follow other related articles on the PHP Chinese website!