Home > Article > Backend Development > Detailed explanation of how to use Golang to crawl Bing wallpapers
Needless to say, just use python to make a crawler. One requests
can cover the world. However, I heard that the http
package built into golang is very powerful. I just don’t have to do any work, but I just want to learn new things and review the knowledge points related to the request and response of the http protocol. Not much to say, let’s start with the whole article
Climb downBing Wallpaper and try it out first. Dog head saves life Dog head saves life Dog head saves life
graph TD 请求数据 --> 解析数据 --> 数据入库
As you can see from the flow chart above, crawlers are not troublesome in fact. The whole process is There are only three steps. Next, let’s talk about what needs to be done in each step
Request data: Here we need to use the built-in package http package in golang to initiate a request to the target address. This step is completed
Parse data: Here we need to parse the requested data, because we do not need the entire requested data, we only need some specific key data. This step is also called data cleaning
Data storage: It is not difficult to understand that this is to store the parsed data into the database
First go to the official website of Bing Wallpaper to observe. If you want to do a crawler, you need to be particularly sensitive to data. This is the homepage information. The whole page is very concise.
Next, you need to call up the browser’s developer tools (you should be very familiar with this. If you are not familiar with it, it will be difficult to follow. ). Directly press F12
or right-click to check But what? On the Bing wallpaper, right-clicking cannot call up the console and can only be called up manually. Don’t worry, just follow the first picture. If a classmate’s chrome is in Chinese, the same operation is done. Select more tools and select developer tools
No surprise, everyone must see a page like this
It doesn’t matter, it’s just some anti-crawling errors on the Bing Wallpaper website. (I didn’t have this anti-crawling error when I crawled a long time ago) This does not affect our operation
Next, select this tool to help us quickly locate the element we wantThen we will Can find the picture information we need
The following is the data to crawl one page
package main import ( "fmt" "github.com/PuerkitoBio/goquery" "io" "io/ioutil" "log" "net/http" "os" "time" ) func Run(method, url string, body io.Reader, client *http.Client) { req, err := http.NewRequest(method, url, body) if err != nil { log.Println("获取请求对象失败") return } req.Header.Set("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36") resp, err := client.Do(req) if err != nil { log.Println("发起请求失败") return } if resp.StatusCode != http.StatusOK { log.Printf("请求失败,状态码:%d", resp.StatusCode) return } defer resp.Body.Close() // 关闭响应对象中的body query, err := goquery.NewDocumentFromReader(resp.Body) if err != nil { log.Println("生成goQuery对象失败") return } query.Find(".container .item").Each(func(i int, s *goquery.Selection) { imgUrl, _ := s.Find("a.ctrl.download").Attr("href") imgName := s.Find(".description>h3").Text() fmt.Println(imgUrl) fmt.Println(imgName) DownloadImage(imgUrl, i, client) time.Sleep(time.Second) fmt.Println("-------------------------") }) } func DownloadImage(url string, index int, client *http.Client) { req, err := http.NewRequest("POST", url, nil) if err != nil { log.Println("获取请求对象失败") return } req.Header.Set("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36") resp, err := client.Do(req) if err != nil { log.Println("发起请求失败") return } data, err := ioutil.ReadAll(resp.Body) if err != nil { log.Println("读取请求体失败") return } baseDir := "./image/image-%d.jpg" f, err := os.OpenFile(fmt.Sprintf(baseDir, index), os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0666) if err != nil { log.Println("打开文件失败", err.Error()) return } defer f.Close() _, err = f.Write(data) if err != nil { log.Println("写入数据失败") return } fmt.Println("下载图片成功") } func main() { client := &http.Client{} url := "https://bing.ioliu.cn/?p=%d" method := "GET" Run(method, url, nil, client) }
The following is to crawl multi-page dataThe code for crawling multiple pages has not changed much. We still need to observe the characteristics of the website first
Discover What happened? The first page p=1, the second page p=2, and the tenth page p=10
So we just start a for loop and then reuse the code that crawled the single page before
// 爬取多页的main函数如下 func main() { client := &http.Client{} url := "https://bing.ioliu.cn/?p=%d" method := "GET" for i := 1; i < 5; i++ { // 实现分页操作 Run(method, fmt.Sprintf(url, i), nil, client) } }
In our example, we use a third-party package of tools to parse web page data, because using regular expressions is really too troublesome
Recommended learning: Golang tutorial
The above is the detailed content of Detailed explanation of how to use Golang to crawl Bing wallpapers. For more information, please follow other related articles on the PHP Chinese website!