search
HomeBackend DevelopmentGolanggolang csv parsing garbled characters

When using Golang to parse csv files, sometimes you will encounter the problem of garbled characters. This situation is very common, but it is also very troublesome. So, how to solve this problem?

First we must understand that csv is a text file format, using "," to separate each field. When the text data in the CSV file contains non-ASCII characters, garbled characters will occur. The cause of this problem is actually related to encoding. It is usually caused by the inconsistency between the encoding format of the csv file and the encoding format used during parsing.

In golang, the commonly used csv library is the built-in encoding/csv. This library uses UTF-8 encoding by default to parse csv files. If you want to process csv files in other encoding formats, additional processing is required.

There are several methods to solve the problem of garbled characters. We will introduce them one by one below:

Method 1. Manual conversion of encoding format

Before parsing csv, we can manually convert The encoding format of the csv file is converted to UTF-8. The easiest way is to use Notepad to open the csv file and save it to UTF-8 format.

Manual conversion may be troublesome, especially when we have a large number of csv files. Therefore, we can try the second method.

Method 2. Use a third-party library

The common csv parsing library in Golang is encoding/csv. If we need to process csv files in other encoding formats, we need to use a third-party library to assist. parse. For example, you can use gocsv to parse csv files in gbk encoding format.

Gocsv installation method:

$ go get github.com/kuangyh/csv

Next, you can use gocsv to parse the csv file like this:

package main

import (
    "encoding/csv"
    "fmt"
    "github.com/kuangyh/csv"
    "os"
)

func main() {
    file, err := os.Open("example.csv")
    if err != nil {
        fmt.Println("Error:", err)
        return
    }

    defer file.Close()

    reader := csv.NewReader(gocsv.NewReader(file))
    reader.Comma = ','

    lines, err := reader.ReadAll()
    if err != nil {
        fmt.Println("Error:", err)
        return
    }

    for i, line := range lines {
        fmt.Printf("Line %d: %v
", i+1, line)
    }
}

In the above code, we first import the gocsv library, then use gocsv to create a new reader, pass it into the encoding/csv library, and set the delimiter to ",". Finally, use the ReadAll method to get all the lines in the file and print the output.

Although this method is effective, it also has some problems. For example, we need to use a third-party library to complete the conversion, which will increase dependencies and complexity. If we don't want to use third-party libraries, there is a third method.

Method 3. Manual parsing

The process of manual parsing may be cumbersome, but it is also an effective solution. The key is to understand the format of the csv file.

Usually we add a file header to the first line of the csv file, which contains the name of each field. This file header is also part of the csv file and can be obtained by parsing the first line. In the data row, the data of each row is composed of multiple fields, and these fields are separated by ",". If there is no garbled code problem, then we can use the encoding/csv library to directly parse the csv file. But if garbled characters occur, you need to manually parse each field and convert them into UTF-8 format.

The following is a manual parsing code:

package main

import (
    "bufio"
    "encoding/csv"
    "fmt"
    "io"
    "os"
)

func main() {
    file, err := os.Open("example.csv")
    if err != nil {
        fmt.Println("Error:", err)
    }
    defer file.Close()

    reader := bufio.NewReader(file)
    var lines [][]string

    for {
        line, err := reader.ReadString('
')
        if err != nil && err != io.EOF {
            fmt.Println("Error:", err)
            return
        }

        if line == "" {
            break
        }

        // 去除换行符
        line = line[:len(line)-2]

        r := csv.NewReader([]byte(line))
        r.Comma = ','

        fields, err := r.Read()
        if err != nil {
            fmt.Println("Error:", err)
            return
        }

        // 将字段转换为UTF-8
        for i, s := range fields {
            fields[i] = transform(s)
        }

        lines = append(lines, fields)
    }

    for i, line := range lines {
        fmt.Printf("Line %d: %v
", i+1, line)
    }
}

// 将单个字段转换为UTF-8
func transform(s string) string {
    data, err := ioutil.ReadAll(transform.NewReader(strings.NewReader(s), simplifiedchinese.GBK.NewDecoder()))
    if err != nil {
        return s
    }
    return string(data)
}

In the above code, we first read each line of the csv file through bufio, and then use the encoding/csv library to parse the data of each line . In order to solve the garbled problem, we use the function transform() to convert each field into UTF-8 format.

This function receives a string parameter, first converts it to Reader, then uses simplifiedchinese.GBK.NewDecoder() to create a decoder, and finally uses the ioutil.ReadAll() function to convert the encoded string into UTF-8.

In this way, we can manually parse the csv file and convert it to UTF-8 encoding format.

Summary:

The above are three methods to solve the problem of golang csv parsing garbled characters. If the csv file you are using is UTF-8 encoded, it can be easily parsed using golang's own encoding/csv. Otherwise, you can choose to manually parse or use a third-party library for conversion according to actual needs. In any case, as long as you master the correct method, the problem of garbled characters is no longer a problem.

The above is the detailed content of golang csv parsing garbled characters. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Go vs. Other Languages: A Comparative AnalysisGo vs. Other Languages: A Comparative AnalysisApr 28, 2025 am 12:17 AM

Goisastrongchoiceforprojectsneedingsimplicity,performance,andconcurrency,butitmaylackinadvancedfeaturesandecosystemmaturity.1)Go'ssyntaxissimpleandeasytolearn,leadingtofewerbugsandmoremaintainablecode,thoughitlacksfeatureslikemethodoverloading.2)Itpe

Comparing init Functions in Go to Static Initializers in Other LanguagesComparing init Functions in Go to Static Initializers in Other LanguagesApr 28, 2025 am 12:16 AM

Go'sinitfunctionandJava'sstaticinitializersbothservetosetupenvironmentsbeforethemainfunction,buttheydifferinexecutionandcontrol.Go'sinitissimpleandautomatic,suitableforbasicsetupsbutcanleadtocomplexityifoverused.Java'sstaticinitializersoffermorecontr

Common Use Cases for the init Function in GoCommon Use Cases for the init Function in GoApr 28, 2025 am 12:13 AM

ThecommonusecasesfortheinitfunctioninGoare:1)loadingconfigurationfilesbeforethemainprogramstarts,2)initializingglobalvariables,and3)runningpre-checksorvalidationsbeforetheprogramproceeds.Theinitfunctionisautomaticallycalledbeforethemainfunction,makin

Channels in Go: Mastering Inter-Goroutine CommunicationChannels in Go: Mastering Inter-Goroutine CommunicationApr 28, 2025 am 12:04 AM

ChannelsarecrucialinGoforenablingsafeandefficientcommunicationbetweengoroutines.Theyfacilitatesynchronizationandmanagegoroutinelifecycle,essentialforconcurrentprogramming.Channelsallowsendingandreceivingvalues,actassignalsforsynchronization,andsuppor

Wrapping Errors in Go: Adding Context to Error ChainsWrapping Errors in Go: Adding Context to Error ChainsApr 28, 2025 am 12:02 AM

In Go, errors can be wrapped and context can be added via errors.Wrap and errors.Unwrap methods. 1) Using the new feature of the errors package, you can add context information during error propagation. 2) Help locate the problem by wrapping errors through fmt.Errorf and %w. 3) Custom error types can create more semantic errors and enhance the expressive ability of error handling.

Security Considerations When Developing with GoSecurity Considerations When Developing with GoApr 27, 2025 am 12:18 AM

Gooffersrobustfeaturesforsecurecoding,butdevelopersmustimplementsecuritybestpracticeseffectively.1)UseGo'scryptopackageforsecuredatahandling.2)Manageconcurrencywithsynchronizationprimitivestopreventraceconditions.3)SanitizeexternalinputstoavoidSQLinj

Understanding Go's error InterfaceUnderstanding Go's error InterfaceApr 27, 2025 am 12:16 AM

Go's error interface is defined as typeerrorinterface{Error()string}, allowing any type that implements the Error() method to be considered an error. The steps for use are as follows: 1. Basically check and log errors, such as iferr!=nil{log.Printf("Anerroroccurred:%v",err)return}. 2. Create a custom error type to provide more information, such as typeMyErrorstruct{MsgstringDetailstring}. 3. Use error wrappers (since Go1.13) to add context without losing the original error message,

Error Handling in Concurrent Go ProgramsError Handling in Concurrent Go ProgramsApr 27, 2025 am 12:13 AM

ToeffectivelyhandleerrorsinconcurrentGoprograms,usechannelstocommunicateerrors,implementerrorwatchers,considertimeouts,usebufferedchannels,andprovideclearerrormessages.1)Usechannelstopasserrorsfromgoroutinestothemainfunction.2)Implementanerrorwatcher

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

Atom editor mac version download

Atom editor mac version download

The most popular open source editor