Home  >  Q&A  >  body text

python - scrapy captures duplicate content of CNKI response

Traverse the URL requesting page turning

for i in range(3):
    yield Request("http:xx/page/%s"%str(i),callback=self.parse_page)

The result is that the response request is successful, but the content is the same every time. It is the content of the first request. However, using Postman to request the paginated URLs separately does not have this problem. = = Have you been banned? It was never like this before

黄舟黄舟2638 days ago868

reply all(3)I'll reply

  • 怪我咯

    怪我咯2017-06-30 09:57:07

    Then we need to analyze the difference between the header requested when using postman or browser and the header requested when using scrapy

    reply
    0
  • 三叔

    三叔2017-06-30 09:57:07

    Recognized by anti-crawling

    reply
    0
  • PHP中文网

    PHP中文网2017-06-30 09:57:07

    Look at the log printed by the console to see if the next page has been crawled correctly
    2017-06-29 09:26:13 [scrapy] DEBUG: Scraped from <200 http:xx/page/x>,
    Pay attention to whether the last x (http:xx/page/x) has changed

    reply
    0
  • Cancelreply