Home  >  Q&A  >  body text

python - Use scrapy to write a crawler. After sending a request, the server all returns 202 directly. What should I do?

I crawled the Chinese Judgment Documents Network, which was fine before. I sent a request and the server returned 200, and then I processed the data in the body

But a week ago, suddenly all requests returned 202, and then the response body was also empty, and no data could be obtained at all. I blocked and waited in the callback function while (response.status == 202) and even slept. If used, the status will not change

what can we do about it?

I used crwalera's IP proxy service. It was also 202 for a while before, but it got better after a day, but this time it has lasted for a week, which is very strange

I think the target website has too much load, so I use an asynchronous method to send data, but how do I receive his data correctly in scrapy?

黄舟黄舟2669 days ago1595

reply all(2)I'll reply

  • 欧阳克

    欧阳克2017-06-28 09:27:09

    This situation is usually caused by illegal crawling, and the server has implemented anti-crawling restrictions. If it is captured legally, you can communicate with the content department to see if there is any accidental damage. If it is captured illegally, it is recommended not to do this. In serious cases, there may be a risk of prosecution

    reply
    0
  • 过去多啦不再A梦

    过去多啦不再A梦2017-06-28 09:27:09

    If you have been prevented from harvesting, you can try changing your IP address or looking for loopholes to prevent harvesting

    reply
    0
  • Cancelreply