python - scrapy 抓不到起始网页内容

Question

初学使用scrapy，按照教程建了很小一个例子，还没有到用pipeline之类的地步，只想看看能不能爬东西下来。代码如下： spider.py： {代码...} items.py: {代码...} 但是发现几个新闻网站都爬取失败了，包括：people...

迷茫 · Answer

Hello poster, this is how I solved this problem. First I opened:

scrapy shell http://people.com.cn

Enter shell mode, then enter:

response.url.split('/')[-2]

I found that the content inside was empty. At this time, I determined that I made a mistake when splitting the URL, so I tried the following code:

response.url.split('/')[-1]

Found the following output:

So the original poster’s reason is that filename does not exist, so the document will not be generated.
Try it.

PHPz · Answer

Test it with the terminal and give it a try
http://scrapy-chs.readthedocs...

天蓬老师 · Answer

I don’t know if you have read the scrapy documentation carefully
http://scrapy-chs.readthedocs...