node.js - Python有了asyncio和aiohttp在爬虫这类型IO任务中多线程/多进程还有存在的必要吗？

Question

最近正在学习Python中的异步编程，看了一些博客后做了一些小测验：对比asyncio+aiohttp的爬虫和asyncio+aiohttp+concurrent.futures(线程池/进程池)在效率中的差异，注释：在爬虫中我几乎没有使用任何计算性任务...

伊谢尔伦 · Answer

I don’t know much about Python crawlers, but generally Scrapy is used to make crawlers. It is based on the twisted asynchronous framework.

Multiple processes can make full use of multiple cores. Currently, the ideal one is multi-process + coroutine.

Because the synchronous method is still used in requests, it will block the thread. In this case, it is meaningless to use asynchronous. You can understand it as using the time.sleep method instead of the asyncio.sleep method in asyncio.

伊谢尔伦 · Answer

Check out this article: http://aosabook.org/en/500L/a...

PHP中文网 · Answer

asyncio adopts the idea of coroutine, which is to process multiple asynchronous tasks in one thread. What are the asynchronous tasks, such as timing, asynchronous IO, etc.

But what if the task does not support asynchronous?

For example, reading and writing a blocking IO, or doing time-consuming a lot of calculations. Coroutines will solve the problem of task blocking, and the advantages of multi-process and multi-thread will be reflected.

The usage scenarios of the two are different. Different scenarios, different plans.

PHP中文网 · Answer

asyncio requires related third-party library support, so basically all the original third-party libraries need to be written separately, such as serial ports, network protocols, including requests and http. In bad cases, after these two As of version time, many of the libraries used are already asynchronous. Includes requests.

PHPz · Answer

asyncio needs an asynchronous API to support it (synchronous non-blocking API is also available, but Python does not have such a thing, you may need to hack it). setInterval

If it is a synchronous blocking API, if one callback is stuck, other callbacks cannot be executed. You can take a look. The IO APIs you have seen so far are basically blocking.

黄舟 · Answer

Python multi-threading is not practical due to the existence of GIL, but multi-process is still very useful

node.js - Python有了asyncio和aiohttp在爬虫这类型IO任务中多线程/多进程还有存在的必要吗？

代码

补充

reply all(6)I'll reply