java - python使用httplib库如何实现请求失败重试机制？

Question

RT，我写了一个爬虫，爬行的时候会偶尔报如图中的错误，请问这是不是网络问题？如果是的话我该怎么写这种失败重试机制？ （PS：我标题 里面说使用httplib库是因为我这个爬虫必须带上指定的cookie才可以请求到数据...

迷茫 · Answer

Thank God I just found the solution

A summary of some techniques for using python crawlers to crawl websites - Python - Bole Online http://python.jobbole.com/81997/ The original text is here

def request(url, cookie='xxx', retries=5):
    ret = urlparse.urlparse(url)  # Parse input URL
    if ret.scheme == 'http':
        conn = httplib.HTTPConnection(ret.netloc)
    elif ret.scheme == 'https':
        conn = httplib.HTTPSConnection(ret.netloc)

    url = ret.path
    if ret.query: url += '?' + ret.query
    if ret.fragment: url += '#' + ret.fragment
    if not url: url = '/'

    try:
        conn.request(method='GET', url=url, headers={'Cookie': cookie})
        res = conn.getresponse()
    except Exception, e:
        print e.message
        if retries > 0:
            return request(url=url, retries= retries - 1)
        else:
            print 'GET Failed'
            return ''
    else:
        pass
    finally:
        pass

    if res.status != 200:
        return None
    return res.read()

The principle is to use a retries variable to store the number of retries, and then recurse itself every time an exception is handled and set the number of retries to -1. If it is determined that the number of retries is less than 0, return directly and print a failure log

大家讲道理 · Answer

Recursively calling itself to perform retrycount to limit is the most direct method.
But there is a problem:
If the other party's address only fails temporarily, such as restarting the service. Retrying immediately still failed. The time for retrying 5 times was very short. When the other party's service was ready, the request was passed because it was retried 5 times

The mechanism I use is to retry five times, waiting for 30s, 1 minute, 10 minutes, 30 minutes, and 1 hour. If it still fails, it is considered to have failed.
Of course, this usage is based on specific business logic. Different business needs have different requirements for requests.

java - python使用httplib库如何实现请求失败重试机制？

reply all(2)I'll reply