Home > Article > Backend Development > Detailed explanation of Python's HTTP proxy
Everyone should be very familiar with HTTP proxy, which has extremely wide applications in many aspects. HTTP proxies are divided into forward proxies and reverse proxies. The latter is generally used to provide users with access to services behind the firewall or for load balancing. Typical ones include Nginx, HAProxy, etc. This article discusses forward proxies.
The most common uses of HTTP proxy are for network sharing, network acceleration, and network limit breakthrough. In addition, HTTP proxy is also commonly used for Web application debugging and Web API monitoring and analysis called in Android/IOS APP. Currently, well-known software includes Fiddler, Charles, Burp Suite and mitmproxy, etc. HTTP proxy can also be used to modify request/response content, add additional functions to web applications or change application behavior without changing the server. 0x01 What is HTTP proxyHTTP proxy is essentially a web application, and it is not fundamentally different from other ordinary web applications. After the HTTP proxy receives the request, it comprehensively determines the target host based on the host name in the Host field in Header and the Get/
POST requestIf the request address is an absolute address, the HTTP proxy uses the Host in the address, otherwise the HOST field in the Header is used. Do a simple test, assuming the network environment is as follows: 192.168.1.2 Web server
$ telnet 192.168.1.3 GET / HTTP/1.0 HOST: 192.168.1.2
Note that two consecutive carriage returns are required at the end, which is a requirement of the HTTP protocol. After completion, you can receive the page content of http://192.168.1.2/. Let’s make some adjustments. Bring the absolute address when making a GET request.
$ telnet 192.168.1.3 GET http://httpbin.org/ip HTTP/1.0 HOST: 192.168.1.2
As can be seen from the above test process, HTTP proxy is not a very complicated thing, as long as the original request is sent to the proxy server. When an HTTP proxy cannot be set, for a small number of hosts that require an HTTP proxy, the simplest way is to point the IP of the target host domain name to the proxy server, which can be achieved by modifying the hosts file.
0x02
PythonSet the HTTP proxy in the program
urllib2/urllib proxy settings
proxy_handler = urllib2.ProxyHandler({'http': '121.193.143.249:80'}) opener = urllib2.build_opener(proxy_handler) r = opener.open('http://httpbin.org/ip') print(r.read())You can also use install_opener to install the configured opener
urllib2.install_opener(opener) r = urllib2.urlopen('http://httpbin.org/ip') print(r.read())
In Python 3, use urllib.
proxy_handler = urllib.request.ProxyHandler({'http': 'http://121.193.143.249:80/'}) opener = urllib.request.build_opener(proxy_handler) r = opener.open('http://httpbin.org/ip') print(r.read())
requests Proxy settingsrequests is one of the best HTTP libraries currently, and it is also the library I use most when constructing http requests. Its API design is very user-friendly and easy to use. Setting up a proxy for requests is very simple. You only need to set a parameter in the form
{'http': 'x.x.x.x:8080', 'https': 'x.x.x.x:8080'}for proxies. Among them, http and https are independent of each other.
In [5]: requests.get('http://httpbin.org/ip', proxies={'http': '121.193.143.249:80'}).json() Out[5]: {'origin': '121.193.143.249'}
of session
, eliminating the trouble of bringing proxies parameters with every request.
s = requests.session() s.proxies = {'http': '121.193.143.249:80'} print(s.get('http://httpbin.org/ip').json())
0x03 HTTP_PROXY / HTTPS_PROXY environment Variablesurllib2 and Requests libraries can recognize HTTP_PROXY and HTTPS_PROXY environment variables. Once these environment variables are detected, they will automatically set up and use the proxy. . This is very useful when debugging with HTTP proxy, because you can adjust the IP address and port of the proxy server according to environment variables without modifying the code. Most software in *nix also supports HTTP_PROXY environment variable recognition, such as curl, wget, axel, aria2c, etc.
$ http_proxy=121.193.143.249:80 python -c 'import requests; print(requests.get("http://httpbin.org/ip").json())' {u'origin': u'121.193.143.249'} $ http_proxy=121.193.143.249:80 curl httpbin.org/ip { "origin": "121.193.143.249" }
In [245]: os.environ['http_proxy'] = '121.193.143.249:80' In [246]: requests.get("http://httpbin.org/ip").json() Out[246]: {u'origin': u'121.193.143.249'} In [249]: os.environ['http_proxy'] = '' In [250]: requests.get("http://httpbin.org/ip").json() Out[250]: {u'origin': u'x.x.x.x'}
0x04 MITM-Proxy
MITM originates from Man-in-the-Middle Attack, which refers to a man-in-the-middle attack. It generally intercepts, monitors and tamperes with data in the network between the client and the server. . <p>mitmproxy是一款Python语言开发的开源中间人代理神器,支持SSL,支持透明代理、反向代理,支持流量录制回放,支持自定义脚本等。功能上同Windows中的Fiddler有些类似,但mitmproxy是一款console程序,没有GUI界面,不过用起来还算方便。使用mitmproxy可以很方便的过滤、拦截、修改任意经过代理的HTTP请求/响应数据包,甚至可以利用它的scripting API,编写脚本达到自动拦截修改HTTP数据的目的。</p>
<pre class="brush:php;toolbar:false"># test.py
def response(flow):
flow.response.headers["BOOM"] = "boom!boom!boom!"</pre>
<p>上面的脚本会在所有经过代理的Http响应包头里面加上一个名为BOOM的header。用<code>mitmproxy -s 'test.py'
命令启动mitmproxy,curl验证结果发现的确多了一个BOOM头。
$ http_proxy=localhost:8080 curl -I 'httpbin.org/get' HTTP/1.1 200 OK Server: nginx Date: Thu, 03 Nov 2016 09:02:04 GMT Content-Type: application/json Content-Length: 186 Connection: keep-alive Access-Control-Allow-Origin: * Access-Control-Allow-Credentials: true BOOM: boom!boom!boom! ...
显然mitmproxy脚本能做的事情远不止这些,结合Python强大的功能,可以衍生出很多应用途径。除此之外,mitmproxy还提供了强大的API,在这些API的基础上,完全可以自己定制一个实现了特殊功能的专属代理服务器。
经过性能测试,发现mitmproxy的效率并不是特别高。如果只是用于调试目的那还好,但如果要用到生产环境,有大量并发请求通过代理的时候,性能还是稍微差点。我用twisted实现了一个简单的proxy,用于给公司内部网站增加功能、改善用户体验,以后有机会再和大家分享。
The above is the detailed content of Detailed explanation of Python's HTTP proxy. For more information, please follow other related articles on the PHP Chinese website!