Home  >  Article  >  Backend Development  >  How can I optimize HTTP request dispatch for 100,000 URLs in Python 2.6?

How can I optimize HTTP request dispatch for 100,000 URLs in Python 2.6?

Susan Sarandon
Susan SarandonOriginal
2024-11-17 16:27:02933browse

How can I optimize HTTP request dispatch for 100,000 URLs in Python 2.6?

Optimizing HTTP Request Dispatch in Python

Handling large-scale HTTP requests can pose a challenge in Python, especially for tasks involving thousands of URLs. This article explores a highly efficient solution for dispatching 100,000 HTTP requests in Python 2.6, leveraging concurrency and threading to maximize performance.

Twistedless Solution:

The following code snippet provides a fast and effective method for sending HTTP requests concurrently:

from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue

concurrent = 200

def doWork():
    while True:
        url = q.get()
        status, url = getStatus(url)
        doSomethingWithResult(status, url)
        q.task_done()

def getStatus(ourl):
    try:
        url = urlparse(ourl)
        conn = httplib.HTTPConnection(url.netloc)   
        conn.request("HEAD", url.path)
        res = conn.getresponse()
        return res.status, ourl
    except:
        return "error", ourl

def doSomethingWithResult(status, url):
    print status, url

q = Queue(concurrent * 2)
for i in range(concurrent):
    t = Thread(target=doWork)
    t.daemon = True
    t.start()
try:
    for url in open('urllist.txt'):
        q.put(url.strip())
    q.join()
except KeyboardInterrupt:
    sys.exit(1)

Explanation:

  • A thread pool is created with a configurable level of concurrency (in this case, 200).
  • Each thread in the pool executes the doWork function, which fetches URLs from a queue and sends HTTP HEAD requests to obtain status codes.
  • The results are processed in the doSomethingWithResult function, which can be customized to log or perform other operations based on the response.
  • The queue ensures that tasks are distributed evenly among the threads, minimizing contention and increasing throughput.

This approach has been shown to be faster than the Twisted-based solution while also reducing CPU usage. It provides a highly efficient and reliable way to handle large-scale HTTP requests in Python 2.6.

The above is the detailed content of How can I optimize HTTP request dispatch for 100,000 URLs in Python 2.6?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn