用Python爬取某网站两个月的数据,程序“卡顿”,但并未报错,也未结束。如上图所示。
例如,我把时间循环设置为2016.8.1--2016.10.1,抓取到数据的就只能到2016.9.4,然后程序也没运行结束,但数据库也一直没有更多的数据进入。出现“卡顿”。
然后我换了台配置更好的电脑,情况会好很多,可以抓取半年的数据。本来我想抓一年的数据,但抓取到半年的数据的数据后,也会出现如图所示的情况,并且数据库也一直未添加更多的数据。即“卡顿”。
我想知道有没有办法能够一次抓取的数据多点??
PHPz2017-04-18 09:31:28
What is the reason for the blockage? You can analyze it in the following ways:
1. Packet capture analysis to see if it is due to the network;
2. What framework did you use to write the crawler? Is it urllib2 or scrapy framework? Check the logs.
3. Check whether the url pool has been processed and no new target tasks have been added to the crawling queue.
PHP中文网2017-04-18 09:31:28
You can use multi-threading, each thread processes one month's data, so that even if there is a problem with the data of any month, the integrity of most of the data can still be ensured, and then the data of the month with the problem can be analyzed in detail.