search
HomeBackend DevelopmentGolangDetailed explanation of how to use Golang to crawl Bing wallpapers

Detailed explanation of how to use Golang to crawl Bing wallpapers

Needless to say, just use python to make a crawler. One requests can cover the world. However, I heard that the http package built into golang is very powerful. I just don’t have to do any work, but I just want to learn new things and review the knowledge points related to the request and response of the http protocol. Not much to say, let’s start with the whole article

Climb downBing Wallpaper and try it out first. Dog head saves life Dog head saves life Dog head saves life

Overview of crawler process

graph TD
请求数据 --> 解析数据 --> 数据入库

As you can see from the flow chart above, crawlers are not troublesome in fact. The whole process is There are only three steps. Next, let’s talk about what needs to be done in each step

  • Request data: Here we need to use the built-in package http package in golang to initiate a request to the target address. This step is completed

  • Parse data: Here we need to parse the requested data, because we do not need the entire requested data, we only need some specific key data. This step is also called data cleaning

  • Data storage: It is not difficult to understand that this is to store the parsed data into the database

Practical Analysis

First go to the official website of Bing Wallpaper to observe. If you want to do a crawler, you need to be particularly sensitive to data. This is the homepage information. The whole page is very concise. Detailed explanation of how to use Golang to crawl Bing wallpapers

Next, you need to call up the browser’s developer tools (you should be very familiar with this. If you are not familiar with it, it will be difficult to follow. ). Directly press F12 or right-click to check Detailed explanation of how to use Golang to crawl Bing wallpapersDetailed explanation of how to use Golang to crawl Bing wallpapers But what? On the Bing wallpaper, right-clicking cannot call up the console and can only be called up manually. Don’t worry, just follow the first picture. If a classmate’s chrome is in Chinese, the same operation is done. Select more tools and select developer tools

No surprise, everyone must see a page like this

Detailed explanation of how to use Golang to crawl Bing wallpapersIt doesn’t matter, it’s just some anti-crawling errors on the Bing Wallpaper website. (I didn’t have this anti-crawling error when I crawled a long time ago) This does not affect our operation

Next, select this tool to help us quickly locate the element we wantDetailed explanation of how to use Golang to crawl Bing wallpapersThen we will Can find the picture information we need

Detailed explanation of how to use Golang to crawl Bing wallpapers

Code actual combat

The following is the data to crawl one page

package main

import (
    "fmt"
    "github.com/PuerkitoBio/goquery"
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "os"
    "time"
)

func Run(method, url string, body io.Reader, client *http.Client) {
    req, err := http.NewRequest(method, url, body)
    if err != nil {
        log.Println("获取请求对象失败")
        return
    }
    req.Header.Set("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36")
    resp, err := client.Do(req)
    if err != nil {
        log.Println("发起请求失败")
        return
    }
    if resp.StatusCode != http.StatusOK {
        log.Printf("请求失败,状态码:%d", resp.StatusCode)
        return
    }
    defer resp.Body.Close() // 关闭响应对象中的body
    query, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil {
        log.Println("生成goQuery对象失败")
        return
    }
    query.Find(".container .item").Each(func(i int, s *goquery.Selection) {
        imgUrl, _ := s.Find("a.ctrl.download").Attr("href")
        imgName := s.Find(".description>h3").Text()
        fmt.Println(imgUrl)
        fmt.Println(imgName)
        DownloadImage(imgUrl, i, client)
        time.Sleep(time.Second)
        fmt.Println("-------------------------")
    })
}

func DownloadImage(url string, index int, client *http.Client) {
    req, err := http.NewRequest("POST", url, nil)
    if err != nil {
        log.Println("获取请求对象失败")
        return
    }
    req.Header.Set("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36")
    resp, err := client.Do(req)
    if err != nil {
        log.Println("发起请求失败")
        return
    }
    data, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        log.Println("读取请求体失败")
        return
    }
    baseDir := "./image/image-%d.jpg"
    f, err := os.OpenFile(fmt.Sprintf(baseDir, index), os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0666)
    if err != nil {
        log.Println("打开文件失败", err.Error())
        return
    }
    defer f.Close()
    _, err = f.Write(data)
    if err != nil {
        log.Println("写入数据失败")
        return
    }
    fmt.Println("下载图片成功")
}

func main() {
    client := &http.Client{}
    url := "https://bing.ioliu.cn/?p=%d"
    method := "GET"
    Run(method, url, nil, client)
}

The following is to crawl multi-page dataThe code for crawling multiple pages has not changed much. We still need to observe the characteristics of the website first

Detailed explanation of how to use Golang to crawl Bing wallpapersDiscover What happened? The first page p=1, the second page p=2, and the tenth page p=10

So we just start a for loop and then reuse the code that crawled the single page before

// 爬取多页的main函数如下
func main() {
    client := &http.Client{}
    url := "https://bing.ioliu.cn/?p=%d"
    method := "GET"
    for i := 1; i < 5; i++ { // 实现分页操作
        Run(method, fmt.Sprintf(url, i), nil, client)
    }
}

Summary

In our example, we use a third-party package of tools to parse web page data, because using regular expressions is really too troublesome

  • Use css selector: goQuery
  • Use xpath selector: htmlquery
  • Regular: built-in package, not recommended, regular rules are difficult to write

Recommended learning: Golang tutorial

The above is the detailed content of Detailed explanation of how to use Golang to crawl Bing wallpapers. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:掘金社区. If there is any infringement, please contact admin@php.cn delete
go语言有没有缩进go语言有没有缩进Dec 01, 2022 pm 06:54 PM

go语言有缩进。在go语言中,缩进直接使用gofmt工具格式化即可(gofmt使用tab进行缩进);gofmt工具会以标准样式的缩进和垂直对齐方式对源代码进行格式化,甚至必要情况下注释也会重新格式化。

聊聊Golang中的几种常用基本数据类型聊聊Golang中的几种常用基本数据类型Jun 30, 2022 am 11:34 AM

本篇文章带大家了解一下golang 的几种常用的基本数据类型,如整型,浮点型,字符,字符串,布尔型等,并介绍了一些常用的类型转换操作。

go语言为什么叫gogo语言为什么叫goNov 28, 2022 pm 06:19 PM

go语言叫go的原因:想表达这门语言的运行速度、开发速度、学习速度(develop)都像gopher一样快。gopher是一种生活在加拿大的小动物,go的吉祥物就是这个小动物,它的中文名叫做囊地鼠,它们最大的特点就是挖洞速度特别快,当然可能不止是挖洞啦。

一文详解Go中的并发【20 张动图演示】一文详解Go中的并发【20 张动图演示】Sep 08, 2022 am 10:48 AM

Go语言中各种并发模式看起来是怎样的?下面本篇文章就通过20 张动图为你演示 Go 并发,希望对大家有所帮助!

tidb是go语言么tidb是go语言么Dec 02, 2022 pm 06:24 PM

是,TiDB采用go语言编写。TiDB是一个分布式NewSQL数据库;它支持水平弹性扩展、ACID事务、标准SQL、MySQL语法和MySQL协议,具有数据强一致的高可用特性。TiDB架构中的PD储存了集群的元信息,如key在哪个TiKV节点;PD还负责集群的负载均衡以及数据分片等。PD通过内嵌etcd来支持数据分布和容错;PD采用go语言编写。

聊聊Golang自带的HttpClient超时机制聊聊Golang自带的HttpClient超时机制Nov 18, 2022 pm 08:25 PM

​在写 Go 的过程中经常对比这两种语言的特性,踩了不少坑,也发现了不少有意思的地方,下面本篇就来聊聊 Go 自带的 HttpClient 的超时机制,希望对大家有所帮助。

go语言是否需要编译go语言是否需要编译Dec 01, 2022 pm 07:06 PM

go语言需要编译。Go语言是编译型的静态语言,是一门需要编译才能运行的编程语言,也就说Go语言程序在运行之前需要通过编译器生成二进制机器码(二进制的可执行文件),随后二进制文件才能在目标机器上运行。

golang map怎么删除元素golang map怎么删除元素Dec 08, 2022 pm 06:26 PM

删除map元素的两种方法:1、使用delete()函数从map中删除指定键值对,语法“delete(map, 键名)”;2、重新创建一个新的map对象,可以清空map中的所有元素,语法“var mapname map[keytype]valuetype”。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!