Home >Backend Development >Python Tutorial >Python crawler uses proxy to crawl web pages
Proxy type (proxy): transparent proxy, anonymous proxy, obfuscated proxy and high-anonymity proxy. Here is some knowledge about the use of proxies by python crawlers, and a proxy pool class. It is convenient for everyone to deal with various complex crawling problems at work.
urllib module uses proxy
urllib/urllib2 It is more troublesome to use proxy. You need to build a ProxyHandler class first, then use this class to build the opener class that opens the web page, and then install the opener in the request.
Proxy format It is "http://127.0.0.1:80". If you want the account password, it is "http://user:password@127.0.0.1:80".
proxy="http://127.0.0.1:80"
The# 创建一个ProxyHandler对象 proxy_support=urllib.request.ProxyHandler({'http':proxy}) # 创建一个opener对象 opener = urllib.request.build_opener(proxy_support) # 给request装载opener urllib.request.install_opener(opener) # 打开一个url r = urllib.request.urlopen('http://youtube.com',timeout = 500)
requests module uses a proxy
Using a proxy for requests is much simpler than urllib... Here we take a single proxy as an example. If you use it multiple times, you can use the session class to build it.
If you need to use a proxy, you can pass any request method Provide proxies parameters to configure individual requests:
import requests proxies = { "http": "http://127.0.0.1:3128", "https": "http://127.0.0.1:2080", } r=requests.get("http://youtube.com", proxies=proxies) print r.text
You can also configure proxies through the environment variables HTTP_PROXY and HTTPS_PROXY.
export HTTP_PROXY="http://127.0.0.1:3128" export HTTPS_PROXY="http://127.0.0.1:2080" python >>> import requests >>> r=requests.get("http://youtube.com") >>> print r.text
If your proxy needs to use HTTP Basic Auth, you can use http://user:password@host/ Syntax:
proxies = { "http": "http://user:pass@127.0.0.1:3128/", }
Python’s proxy is very simple to use. The most important thing is to find a proxy with a stable and reliable network. If you have any questions, please leave a message