


The number of Queue threads and request delay of Go language crawler framework Colly
Efficient concurrent request processing is crucial when using the Go crawler framework Colly. This article will dig into how thread count settings and request delays in queue
in Colly affect concurrent processing and answer a common question.
Problem: Interaction between number of threads and request delay
Suppose we set queue
's number of threads to 2:
q, _ := queue.New(2, storage)
And added 3 requests. Meanwhile, colly.Limit()
is used to set the delay of each request to 5 seconds. It is expected that two requests are issued almost simultaneously and respond after 5 seconds, and the third request is delayed by another 5 seconds. However, the actual result is:
- Two requests are created.
- After 5 seconds, the first request responds and a third request is created.
- After 5 seconds, the second request responds.
- After 5 seconds, the third request responds.
This is not processed in parallel. Why does the number of threads of queue
seem to fail? Does colly.Limit()
affect the concurrency of queue
? Is onrequest
callback function just creating a request, not actually making a request?
Analysis: Independence between number of threads and request delay
Colly's queue
manages the number of concurrent requests, while colly.Limit()
sets the delay for each request. The two are independent mechanisms.
The number of threads of queue
limits the number of requests processed simultaneously. colly.Limit()
applies a delay before each request is issued.
In the above case:
-
queue
creates two requests, butcolly.Limit()
makes them both wait for 5 seconds. - The first request is issued after the delay is over. After the response,
queue
releases a thread and creates a third request. - The second request is also sent and responded after waiting for 5 seconds.
- The third request is also sent and responded after waiting for 5 seconds.
Therefore, the request delay masks the concurrency of queue
.
onrequest
callback and request issuance time
onrequest
callback function is fired when the request is added to queue
, not when the request is actually issued. It is used to perform some preprocessing operations before the request is issued.
Conclusion: Coordinate the number of threads and request delays
The delay of colly.Limit()
will affect the concurrency effect of the number of queue
threads. To achieve true concurrency, careful coordination of thread count and request delay settings is required. If high concurrency is required, the delay set by colly.Limit()
should be minimized or removed, or a finer concurrency control mechanism should be considered. If you need to control the crawl speed, it is recommended to use a finer granular control method instead of relying on colly.Limit()
.
The above is the detailed content of In the Go crawler framework Colly, how does the thread count setting of Queue and request delay affect the concurrent processing of requests?. For more information, please follow other related articles on the PHP Chinese website!

go语言有缩进。在go语言中,缩进直接使用gofmt工具格式化即可(gofmt使用tab进行缩进);gofmt工具会以标准样式的缩进和垂直对齐方式对源代码进行格式化,甚至必要情况下注释也会重新格式化。

go语言叫go的原因:想表达这门语言的运行速度、开发速度、学习速度(develop)都像gopher一样快。gopher是一种生活在加拿大的小动物,go的吉祥物就是这个小动物,它的中文名叫做囊地鼠,它们最大的特点就是挖洞速度特别快,当然可能不止是挖洞啦。

是,TiDB采用go语言编写。TiDB是一个分布式NewSQL数据库;它支持水平弹性扩展、ACID事务、标准SQL、MySQL语法和MySQL协议,具有数据强一致的高可用特性。TiDB架构中的PD储存了集群的元信息,如key在哪个TiKV节点;PD还负责集群的负载均衡以及数据分片等。PD通过内嵌etcd来支持数据分布和容错;PD采用go语言编写。

go语言能编译。Go语言是编译型的静态语言,是一门需要编译才能运行的编程语言。对Go语言程序进行编译的命令有两种:1、“go build”命令,可以将Go语言程序代码编译成二进制的可执行文件,但该二进制文件需要手动运行;2、“go run”命令,会在编译后直接运行Go语言程序,编译过程中会产生一个临时文件,但不会生成可执行文件。

go语言需要编译。Go语言是编译型的静态语言,是一门需要编译才能运行的编程语言,也就说Go语言程序在运行之前需要通过编译器生成二进制机器码(二进制的可执行文件),随后二进制文件才能在目标机器上运行。

删除map元素的两种方法:1、使用delete()函数从map中删除指定键值对,语法“delete(map, 键名)”;2、重新创建一个新的map对象,可以清空map中的所有元素,语法“var mapname map[keytype]valuetype”。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SublimeText3 English version
Recommended: Win version, supports code prompts!

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment
