Home >Backend Development >Python Tutorial >Python crawler uses proxy to crawl web pages

Python crawler uses proxy to crawl web pages

大家讲道理
大家讲道理Original
2016-11-07 10:59:512121browse

Proxy type (proxy): transparent proxy, anonymous proxy, obfuscated proxy and high-anonymity proxy. Here is some knowledge about the use of proxies by python crawlers, and a proxy pool class. It is convenient for everyone to deal with various complex crawling problems at work.

urllib module uses proxy

urllib/urllib2 It is more troublesome to use proxy. You need to build a ProxyHandler class first, then use this class to build the opener class that opens the web page, and then install the opener in the request.

Proxy format It is "http://127.0.0.1:80". If you want the account password, it is "http://user:password@127.0.0.1:80".

proxy="http://127.0.0.1:80"

The
# 创建一个ProxyHandler对象
proxy_support=urllib.request.ProxyHandler({'http':proxy})
# 创建一个opener对象
opener = urllib.request.build_opener(proxy_support)
# 给request装载opener
urllib.request.install_opener(opener)
# 打开一个url
r = urllib.request.urlopen('http://youtube.com',timeout = 500)

requests module uses a proxy

Using a proxy for requests is much simpler than urllib... Here we take a single proxy as an example. If you use it multiple times, you can use the session class to build it.

If you need to use a proxy, you can pass any request method Provide proxies parameters to configure individual requests:

import requests
proxies = {
  "http": "http://127.0.0.1:3128",
  "https": "http://127.0.0.1:2080",
}
r=requests.get("http://youtube.com", proxies=proxies)
print r.text

You can also configure proxies through the environment variables HTTP_PROXY and HTTPS_PROXY.

export HTTP_PROXY="http://127.0.0.1:3128"
export HTTPS_PROXY="http://127.0.0.1:2080"
python
>>> import requests
>>> r=requests.get("http://youtube.com")
>>> print r.text

If your proxy needs to use HTTP Basic Auth, you can use http://user:password@host/ Syntax:

proxies = {
    "http": "http://user:pass@127.0.0.1:3128/",
}

Python’s proxy is very simple to use. The most important thing is to find a proxy with a stable and reliable network. If you have any questions, please leave a message

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn