Home  >  Article  >  Backend Development  >  How to write garbled crawler code in golang

How to write garbled crawler code in golang

angryTom
angryTomOriginal
2020-02-15 09:52:403409browse

How to write garbled crawler code in golang

What to do when writing crawler garbled code in golang

When writing a crawler program in golang, you will encounter a page with encoding format gb2312.

It can be seen from the web page that the character encoding of the page is gb2312

<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />

and golang supports the UTF-8 encoding format by default, so the result of climbing directly will be Garbled characters.

Solution:

Use github.com/axgle/mahonia This package can complete the encoding conversion,

1, and execute go get github.com/axgle/mahonia After the command is used to download this package,

github.com\axgle\mahonia
2 will be produced in the

%gopath%/src

directory. 2. How to use the code

1) Import package

import "github.com/axgle/mahonia"

2) Conversion function

func ConvertToString(src string, srcCode string, tagCode string) string {
    srcCoder := mahonia.NewDecoder(srcCode)
    srcResult := srcCoder.ConvertString(src)
    tagCoder := mahonia.NewDecoder(tagCode)
    _, cdata, _ := tagCoder.Translate([]byte(srcResult), true)
    result := string(cdata)
    return result
}

3) Call this function where string conversion encoding is required

result = ConvertToString(html, "gbk", "utf-8")

For more golang knowledge, please Follow the golang tutorial column on the PHP Chinese website.

The above is the detailed content of How to write garbled crawler code in golang. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn