Needless to say, just use python to make a crawler. One requests
can cover the world. However, I heard that the http
package built into golang is very powerful. I just don’t have to do any work, but I just want to learn new things and review the knowledge points related to the request and response of the http protocol. Not much to say, let’s start with the whole article
Climb downBing Wallpaper and try it out first. Dog head saves life Dog head saves life Dog head saves life
Overview of crawler process
graph TD 请求数据 --> 解析数据 --> 数据入库
As you can see from the flow chart above, crawlers are not troublesome in fact. The whole process is There are only three steps. Next, let’s talk about what needs to be done in each step
Request data: Here we need to use the built-in package http package in golang to initiate a request to the target address. This step is completed
Parse data: Here we need to parse the requested data, because we do not need the entire requested data, we only need some specific key data. This step is also called data cleaning
Data storage: It is not difficult to understand that this is to store the parsed data into the database
Practical Analysis
First go to the official website of Bing Wallpaper to observe. If you want to do a crawler, you need to be particularly sensitive to data. This is the homepage information. The whole page is very concise.
Next, you need to call up the browser’s developer tools (you should be very familiar with this. If you are not familiar with it, it will be difficult to follow. ). Directly press F12
or right-click to check But what? On the Bing wallpaper, right-clicking cannot call up the console and can only be called up manually. Don’t worry, just follow the first picture. If a classmate’s chrome is in Chinese, the same operation is done. Select more tools and select developer tools
No surprise, everyone must see a page like this
It doesn’t matter, it’s just some anti-crawling errors on the Bing Wallpaper website. (I didn’t have this anti-crawling error when I crawled a long time ago) This does not affect our operation
Next, select this tool to help us quickly locate the element we wantThen we will Can find the picture information we need
Code actual combat
The following is the data to crawl one page
package main import ( "fmt" "github.com/PuerkitoBio/goquery" "io" "io/ioutil" "log" "net/http" "os" "time" ) func Run(method, url string, body io.Reader, client *http.Client) { req, err := http.NewRequest(method, url, body) if err != nil { log.Println("获取请求对象失败") return } req.Header.Set("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36") resp, err := client.Do(req) if err != nil { log.Println("发起请求失败") return } if resp.StatusCode != http.StatusOK { log.Printf("请求失败,状态码:%d", resp.StatusCode) return } defer resp.Body.Close() // 关闭响应对象中的body query, err := goquery.NewDocumentFromReader(resp.Body) if err != nil { log.Println("生成goQuery对象失败") return } query.Find(".container .item").Each(func(i int, s *goquery.Selection) { imgUrl, _ := s.Find("a.ctrl.download").Attr("href") imgName := s.Find(".description>h3").Text() fmt.Println(imgUrl) fmt.Println(imgName) DownloadImage(imgUrl, i, client) time.Sleep(time.Second) fmt.Println("-------------------------") }) } func DownloadImage(url string, index int, client *http.Client) { req, err := http.NewRequest("POST", url, nil) if err != nil { log.Println("获取请求对象失败") return } req.Header.Set("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36") resp, err := client.Do(req) if err != nil { log.Println("发起请求失败") return } data, err := ioutil.ReadAll(resp.Body) if err != nil { log.Println("读取请求体失败") return } baseDir := "./image/image-%d.jpg" f, err := os.OpenFile(fmt.Sprintf(baseDir, index), os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0666) if err != nil { log.Println("打开文件失败", err.Error()) return } defer f.Close() _, err = f.Write(data) if err != nil { log.Println("写入数据失败") return } fmt.Println("下载图片成功") } func main() { client := &http.Client{} url := "https://bing.ioliu.cn/?p=%d" method := "GET" Run(method, url, nil, client) }
The following is to crawl multi-page dataThe code for crawling multiple pages has not changed much. We still need to observe the characteristics of the website first
Discover What happened? The first page p=1, the second page p=2, and the tenth page p=10
So we just start a for loop and then reuse the code that crawled the single page before
// 爬取多页的main函数如下 func main() { client := &http.Client{} url := "https://bing.ioliu.cn/?p=%d" method := "GET" for i := 1; i < 5; i++ { // 实现分页操作 Run(method, fmt.Sprintf(url, i), nil, client) } }
Summary
In our example, we use a third-party package of tools to parse web page data, because using regular expressions is really too troublesome
- Use css selector: goQuery
- Use xpath selector: htmlquery
- Regular: built-in package, not recommended, regular rules are difficult to write
Recommended learning: Golang tutorial
The above is the detailed content of Detailed explanation of how to use Golang to crawl Bing wallpapers. For more information, please follow other related articles on the PHP Chinese website!

go语言有缩进。在go语言中,缩进直接使用gofmt工具格式化即可(gofmt使用tab进行缩进);gofmt工具会以标准样式的缩进和垂直对齐方式对源代码进行格式化,甚至必要情况下注释也会重新格式化。

本篇文章带大家了解一下golang 的几种常用的基本数据类型,如整型,浮点型,字符,字符串,布尔型等,并介绍了一些常用的类型转换操作。

go语言叫go的原因:想表达这门语言的运行速度、开发速度、学习速度(develop)都像gopher一样快。gopher是一种生活在加拿大的小动物,go的吉祥物就是这个小动物,它的中文名叫做囊地鼠,它们最大的特点就是挖洞速度特别快,当然可能不止是挖洞啦。

是,TiDB采用go语言编写。TiDB是一个分布式NewSQL数据库;它支持水平弹性扩展、ACID事务、标准SQL、MySQL语法和MySQL协议,具有数据强一致的高可用特性。TiDB架构中的PD储存了集群的元信息,如key在哪个TiKV节点;PD还负责集群的负载均衡以及数据分片等。PD通过内嵌etcd来支持数据分布和容错;PD采用go语言编写。

在写 Go 的过程中经常对比这两种语言的特性,踩了不少坑,也发现了不少有意思的地方,下面本篇就来聊聊 Go 自带的 HttpClient 的超时机制,希望对大家有所帮助。

go语言需要编译。Go语言是编译型的静态语言,是一门需要编译才能运行的编程语言,也就说Go语言程序在运行之前需要通过编译器生成二进制机器码(二进制的可执行文件),随后二进制文件才能在目标机器上运行。

删除map元素的两种方法:1、使用delete()函数从map中删除指定键值对,语法“delete(map, 键名)”;2、重新创建一个新的map对象,可以清空map中的所有元素,语法“var mapname map[keytype]valuetype”。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

SublimeText3 Linux new version
SublimeText3 Linux latest version

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version
Useful JavaScript development tools

SublimeText3 English version
Recommended: Win version, supports code prompts!
