Home >Backend Development >Golang >Detailed explanation of how to use Golang to crawl Bing wallpapers

Detailed explanation of how to use Golang to crawl Bing wallpapers

青灯夜游
青灯夜游forward
2023-02-20 19:38:283098browse

Detailed explanation of how to use Golang to crawl Bing wallpapers

Needless to say, just use python to make a crawler. One requests can cover the world. However, I heard that the http package built into golang is very powerful. I just don’t have to do any work, but I just want to learn new things and review the knowledge points related to the request and response of the http protocol. Not much to say, let’s start with the whole article

Climb downBing Wallpaper and try it out first. Dog head saves life Dog head saves life Dog head saves life

Overview of crawler process

graph TD
请求数据 --> 解析数据 --> 数据入库

As you can see from the flow chart above, crawlers are not troublesome in fact. The whole process is There are only three steps. Next, let’s talk about what needs to be done in each step

  • Request data: Here we need to use the built-in package http package in golang to initiate a request to the target address. This step is completed

  • Parse data: Here we need to parse the requested data, because we do not need the entire requested data, we only need some specific key data. This step is also called data cleaning

  • Data storage: It is not difficult to understand that this is to store the parsed data into the database

Practical Analysis

First go to the official website of Bing Wallpaper to observe. If you want to do a crawler, you need to be particularly sensitive to data. This is the homepage information. The whole page is very concise. Detailed explanation of how to use Golang to crawl Bing wallpapers

Next, you need to call up the browser’s developer tools (you should be very familiar with this. If you are not familiar with it, it will be difficult to follow. ). Directly press F12 or right-click to check Detailed explanation of how to use Golang to crawl Bing wallpapersDetailed explanation of how to use Golang to crawl Bing wallpapers But what? On the Bing wallpaper, right-clicking cannot call up the console and can only be called up manually. Don’t worry, just follow the first picture. If a classmate’s chrome is in Chinese, the same operation is done. Select more tools and select developer tools

No surprise, everyone must see a page like this

Detailed explanation of how to use Golang to crawl Bing wallpapersIt doesn’t matter, it’s just some anti-crawling errors on the Bing Wallpaper website. (I didn’t have this anti-crawling error when I crawled a long time ago) This does not affect our operation

Next, select this tool to help us quickly locate the element we wantDetailed explanation of how to use Golang to crawl Bing wallpapersThen we will Can find the picture information we need

Detailed explanation of how to use Golang to crawl Bing wallpapers

Code actual combat

The following is the data to crawl one page

package main

import (
    "fmt"
    "github.com/PuerkitoBio/goquery"
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "os"
    "time"
)

func Run(method, url string, body io.Reader, client *http.Client) {
    req, err := http.NewRequest(method, url, body)
    if err != nil {
        log.Println("获取请求对象失败")
        return
    }
    req.Header.Set("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36")
    resp, err := client.Do(req)
    if err != nil {
        log.Println("发起请求失败")
        return
    }
    if resp.StatusCode != http.StatusOK {
        log.Printf("请求失败,状态码:%d", resp.StatusCode)
        return
    }
    defer resp.Body.Close() // 关闭响应对象中的body
    query, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil {
        log.Println("生成goQuery对象失败")
        return
    }
    query.Find(".container .item").Each(func(i int, s *goquery.Selection) {
        imgUrl, _ := s.Find("a.ctrl.download").Attr("href")
        imgName := s.Find(".description>h3").Text()
        fmt.Println(imgUrl)
        fmt.Println(imgName)
        DownloadImage(imgUrl, i, client)
        time.Sleep(time.Second)
        fmt.Println("-------------------------")
    })
}

func DownloadImage(url string, index int, client *http.Client) {
    req, err := http.NewRequest("POST", url, nil)
    if err != nil {
        log.Println("获取请求对象失败")
        return
    }
    req.Header.Set("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36")
    resp, err := client.Do(req)
    if err != nil {
        log.Println("发起请求失败")
        return
    }
    data, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        log.Println("读取请求体失败")
        return
    }
    baseDir := "./image/image-%d.jpg"
    f, err := os.OpenFile(fmt.Sprintf(baseDir, index), os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0666)
    if err != nil {
        log.Println("打开文件失败", err.Error())
        return
    }
    defer f.Close()
    _, err = f.Write(data)
    if err != nil {
        log.Println("写入数据失败")
        return
    }
    fmt.Println("下载图片成功")
}

func main() {
    client := &http.Client{}
    url := "https://bing.ioliu.cn/?p=%d"
    method := "GET"
    Run(method, url, nil, client)
}

The following is to crawl multi-page dataThe code for crawling multiple pages has not changed much. We still need to observe the characteristics of the website first

Detailed explanation of how to use Golang to crawl Bing wallpapersDiscover What happened? The first page p=1, the second page p=2, and the tenth page p=10

So we just start a for loop and then reuse the code that crawled the single page before

// 爬取多页的main函数如下
func main() {
    client := &http.Client{}
    url := "https://bing.ioliu.cn/?p=%d"
    method := "GET"
    for i := 1; i < 5; i++ { // 实现分页操作
        Run(method, fmt.Sprintf(url, i), nil, client)
    }
}

Summary

In our example, we use a third-party package of tools to parse web page data, because using regular expressions is really too troublesome

  • Use css selector: goQuery
  • Use xpath selector: htmlquery
  • Regular: built-in package, not recommended, regular rules are difficult to write

Recommended learning: Golang tutorial

The above is the detailed content of Detailed explanation of how to use Golang to crawl Bing wallpapers. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:juejin.cn. If there is any infringement, please contact admin@php.cn delete