利用python中的scrapy框架的css选择器对具体标签内容进行获取,但是获取不到内容。
当前网页源码(是js渲染之前的代码):
css选择器代码:urllist = response.css('ul.nav li a::attr(href)')[0::3].extract()
运行结果是:
urllist===================[]
urllist长度============ 0
css选择器内的代码应该是没有错误的,为什么获取不到内容?
由于怀疑是css选择器出现了问题,因此替换xpath选择器,
xpath选择器代码:urllist=response.xpath('//ul[@class ="nav"]/li/a/@href').extract()
但是运行结果和css选择器相同。内容仍为空,长度为0
PHP中文网2017-04-18 10:14:41
Maybe your problem is not with the css selector code. Check whether the response content is consistent with what you see on the webpage
ringa_lee2017-04-18 10:14:41
You try to use scrapy shell in the command line to add the target url and then you can get a response object. First check whether the response object is normal. You can first check the response.body to see if it is what you want to crawl. The source code of the web page
and then use this response object to debug the code of your css selector