Home > Article > Backend Development > Detailed explanation of how Python crawlers use proxy to crawl web pages
Proxy type (proxy): transparent proxy, anonymous proxy, confusion proxy and high-anonymity proxy. Here is some knowledge of pythoncrawlers using proxy, and a proxy pool class. It is convenient for everyone to deal with various aspects of work. A complex crawling problem.
urllib/urllib2 It is more troublesome to use proxy. You need to build a ProxyHandler class first, and then use this class to build the opener class that opens the web page, and then in the request Install the opener.
The proxy format is "http://127.0.0.1:80". If you want the account password, it is "http://user:password@127.0.0.1:80".
proxy="http://127.0.0.1:80" # 创建一个ProxyHandler对象 proxy_support=urllib.request.ProxyHandler({'http':proxy}) # 创建一个opener对象 opener = urllib.request.build_opener(proxy_support) # 给request装载opener urllib.request.install_opener(opener) # 打开一个url r = urllib.request.urlopen('http://youtube.com',timeout = 500)
Using proxy for requests is much simpler than urllib...Here is a single proxy as an example. If it is used multiple times, you can use session to build a class.
If you need to use a proxy, you can configure a single request by providing the proxies parameter to any request method:
import requests proxies = { "http": "http://127.0.0.1:3128", "https": "http://127.0.0.1:2080", } r=requests.get("http://youtube.com", proxies=proxies) print r.text
You can also configure the proxy through the environment variables HTTP_PROXY and HTTPS_PROXY.
export HTTP_PROXY="http://127.0.0.1:3128" export HTTPS_PROXY="http://127.0.0.1:2080" python >>> import requests >>> r=requests.get("http://youtube.com") >>> print r.text
If your proxy needs to use HTTP Basic Auth, you can use http://user:password@host/ Syntax:
proxies = { "http": "http://user:pass@127.0.0.1:3128/", }
Using python's proxy is very simple. The most important thing is to Find an agent with a stable and reliable network. If you have any questions, please leave a message
The above is the detailed content of Detailed explanation of how Python crawlers use proxy to crawl web pages. For more information, please follow other related articles on the PHP Chinese website!