Home  >  Q&A  >  body text

Python - How to quickly confirm the status code of a web page with 200 million++ URLs?

I used requests to write a multi-thread, but it feels a bit slow. Are there any other methods?

世界只因有你世界只因有你2712 days ago581

reply all(5)I'll reply

  • PHPz

    PHPz2017-05-18 10:58:14

    Use Tornado’s curl client support to close the connection after reading the request header. (I haven’t tried it yet. If the HTTP client it provides does not support closing the connection midway, you can use TCP and then use http-parser to parse it like I did.)

    Okay, actually you can just add an extension to fetchtitle to get the status code... (remember to install pycurl)

    reply
    0
  • 巴扎黑

    巴扎黑2017-05-18 10:58:14

    Python is inherently slow. If you want to be fast, just write the tcp request directly and then read the reply. After reading the status, close the socket.

    reply
    0
  • ringa_lee

    ringa_lee2017-05-18 10:58:14

    Using grequests, requests are encapsulated concurrently

    https://github.com/kennethrei...

    reply
    0
  • 迷茫

    迷茫2017-05-18 10:58:14

    In this case, you can consider using gevent, tornado, scrapy-redis, asyncio!

    reply
    0
  • 大家讲道理

    大家讲道理2017-05-18 10:58:14

    Using Head to request can it be faster?

    reply
    0
  • Cancelreply