Home >Backend Development >Python Tutorial >Detailed explanation of Python's HTTP proxy

Detailed explanation of Python's HTTP proxy

高洛峰
高洛峰Original
2017-03-21 10:04:452142browse

0x00 Preface

Everyone should be very familiar with HTTP proxy, which has extremely wide applications in many aspects. HTTP proxies are divided into forward proxies and reverse proxies. The latter is generally used to provide users with access to services behind the firewall or for load balancing. Typical ones include Nginx, HAProxy, etc. This article discusses forward proxies.

The most common uses of HTTP proxy are for network sharing, network acceleration, and network limit breakthrough. In addition, HTTP proxy is also commonly used for Web application debugging and Web API monitoring and analysis called in Android/IOS APP. Currently, well-known software includes Fiddler, Charles, Burp Suite and mitmproxy, etc. HTTP proxy can also be used to modify request/response content, add additional functions to web applications or change application behavior without changing the server. 0x01 What is HTTP proxyHTTP proxy is essentially a web application, and it is not fundamentally different from other ordinary web applications. After the HTTP proxy receives the request, it comprehensively determines the target host based on the host name in the Host field in Header and the Get/

POST request

address, establishes a new HTTP request, forwards the request data, and The received response data is forwarded to the client.

If the request address is an absolute address, the HTTP proxy uses the Host in the address, otherwise the HOST field in the Header is used. Do a simple test, assuming the network environment is as follows: 192.168.1.2 Web server

    192.168.1.3 HTTP proxy server
  • Use telnet to test
  • $ telnet 192.168.1.3
    GET / HTTP/1.0
    HOST: 192.168.1.2

    Note that two consecutive carriage returns are required at the end, which is a requirement of the HTTP protocol. After completion, you can receive the page content of http://192.168.1.2/. Let’s make some adjustments. Bring the absolute address when making a GET request.

    $ telnet 192.168.1.3
    GET http://httpbin.org/ip HTTP/1.0
    HOST: 192.168.1.2
  • Note that the HOST is also set to 192.168.1.2, but the running result returns the content of the http://httpbin.org/ip page, as well. It is the public IP address information.

As can be seen from the above test process, HTTP proxy is not a very complicated thing, as long as the original request is sent to the proxy server. When an HTTP proxy cannot be set, for a small number of hosts that require an HTTP proxy, the simplest way is to point the IP of the target host domain name to the proxy server, which can be achieved by modifying the hosts file.

0x02

Python

Set the HTTP proxy in the program

urllib2/urllib proxy settings

urllib2 is a Python standard library with very powerful functions, but it is a little difficult to use. A little troublesome. In Python 3, urllib2 is no longer retained and moved to the urllib module. In urllib2, ProxyHandler is used to set up the proxy server.
proxy_handler = urllib2.ProxyHandler({'http': '121.193.143.249:80'})
opener = urllib2.build_opener(proxy_handler)
r = opener.open('http://httpbin.org/ip')
print(r.read())
You can also use install_opener to install the configured opener

into the global environment, so that all urllib2.urlopen will automatically use the proxy.

urllib2.install_opener(opener)
r = urllib2.urlopen('http://httpbin.org/ip')
print(r.read())

In Python 3, use urllib.

proxy_handler = urllib.request.ProxyHandler({'http': 'http://121.193.143.249:80/'})
opener = urllib.request.build_opener(proxy_handler)
r = opener.open('http://httpbin.org/ip')
print(r.read())

requests Proxy settingsrequests is one of the best HTTP libraries currently, and it is also the library I use most when constructing http requests. Its API design is very user-friendly and easy to use. Setting up a proxy for requests is very simple. You only need to set a parameter in the form

{'http': 'x.x.x.x:8080', 'https': 'x.x.x.x:8080'}

for proxies. Among them, http and https are independent of each other.

In [5]: requests.get('http://httpbin.org/ip', proxies={'http': '121.193.143.249:80'}).json()
Out[5]: {'origin': '121.193.143.249'}

You can directly set the proxies

attribute

of session, eliminating the trouble of bringing proxies parameters with every request.

s = requests.session()
s.proxies = {'http': '121.193.143.249:80'}
print(s.get('http://httpbin.org/ip').json())

0x03 HTTP_PROXY / HTTPS_PROXY environment Variablesurllib2 and Requests libraries can recognize HTTP_PROXY and HTTPS_PROXY environment variables. Once these environment variables are detected, they will automatically set up and use the proxy. . This is very useful when debugging with HTTP proxy, because you can adjust the IP address and port of the proxy server according to environment variables without modifying the code. Most software in *nix also supports HTTP_PROXY environment variable recognition, such as curl, wget, axel, aria2c, etc.

$ http_proxy=121.193.143.249:80 python -c 'import requests; print(requests.get("http://httpbin.org/ip").json())'
{u'origin': u'121.193.143.249'}

$ http_proxy=121.193.143.249:80 curl httpbin.org/ip
{
  "origin": "121.193.143.249"
}

In the IPython interactive environment, you may often need to temporarily debug HTTP requests. This can be achieved simply by setting os.environ['http_proxy'] to add/cancel the HTTP proxy.

In [245]: os.environ['http_proxy'] = '121.193.143.249:80'
In [246]: requests.get("http://httpbin.org/ip").json()
Out[246]: {u'origin': u'121.193.143.249'}
In [249]: os.environ['http_proxy'] = ''
In [250]: requests.get("http://httpbin.org/ip").json()
Out[250]: {u'origin': u'x.x.x.x'}

0x04 MITM-Proxy

MITM originates from Man-in-the-Middle Attack, which refers to a man-in-the-middle attack. It generally intercepts, monitors and tamperes with data in the network between the client and the server. . <p>mitmproxy是一款Python语言开发的开源中间人代理神器,支持SSL,支持透明代理、反向代理,支持流量录制回放,支持自定义脚本等。功能上同Windows中的Fiddler有些类似,但mitmproxy是一款console程序,没有GUI界面,不过用起来还算方便。使用mitmproxy可以很方便的过滤、拦截、修改任意经过代理的HTTP请求/响应数据包,甚至可以利用它的scripting API,编写脚本达到自动拦截修改HTTP数据的目的。</p> <pre class="brush:php;toolbar:false"># test.py def response(flow):     flow.response.headers[&quot;BOOM&quot;] = &quot;boom!boom!boom!&quot;</pre> <p>上面的脚本会在所有经过代理的Http响应包头里面加上一个名为BOOM的header。用<code>mitmproxy -s 'test.py'命令启动mitmproxy,curl验证结果发现的确多了一个BOOM头。

$ http_proxy=localhost:8080 curl -I 'httpbin.org/get'
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 03 Nov 2016 09:02:04 GMT
Content-Type: application/json
Content-Length: 186
Connection: keep-alive
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
BOOM: boom!boom!boom!
...

显然mitmproxy脚本能做的事情远不止这些,结合Python强大的功能,可以衍生出很多应用途径。除此之外,mitmproxy还提供了强大的API,在这些API的基础上,完全可以自己定制一个实现了特殊功能的专属代理服务器。

经过性能测试,发现mitmproxy的效率并不是特别高。如果只是用于调试目的那还好,但如果要用到生产环境,有大量并发请求通过代理的时候,性能还是稍微差点。我用twisted实现了一个简单的proxy,用于给公司内部网站增加功能、改善用户体验,以后有机会再和大家分享。

The above is the detailed content of Detailed explanation of Python's HTTP proxy. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn