The job requires crawling information on Amazon, but Amazon's anti-crawler is too powerful, and the same IP address will be blocked.
Python version: 3.6, IDE: Pycharm 2017.1
I checked a lot of information on the Internet and read the manual of the requests library, but they are all the same method. The code is as follows:
import requests
'''代理IP地址(高匿)'''
proxy = {'HTTPS': '117.85.105.170:808'}
'''head 信息'''
head = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
'Connection': 'keep-alive'}
'''http://icanhazip.com会返回当前的IP地址'''
p = requests.get('http://icanhazip.com', headers=head, proxies=proxy)
print(p.text)
According to the theory of many tutorials I have read, if the proxy is set up successfully, the last IP displayed should be the IP address of the proxy, but in the end it is still my real IP address. Doesn’t this mean that the proxy is not set?
阿神2017-06-12 09:26:11
Proxies use http settings when you access http, and https settings when you access https. So your proxy needs to contain both http and https configurations for it to take effect
proxy = {
'http': 'http://117.85.105.170:808',
'https': 'https://117.85.105.170:808'
}