Home > Article > Backend Development > Detailed explanation of the http module that comes with python
I haven’t blogged for a long time, because the blogger started another wonderful internship experience this year, learning and doing projects, and the time is already full; I am very grateful to these two experiences this year, which brought me into contact with golang and Python, learning different languages, can break out of the limitations of previous learning of c/c++, learn the excellent features of golang and python, and understand the application of different languages in different scenarios; and learning linux and c/c++ before has also made me quickly Let’s get started with golang and python;
My learning habit, in addition to learning how to use it, I also like to study the source code and learn the operating mechanism, so that I can use it easily or in other words, use these languages or The framework is just like eating and sleeping, very natural; because I have recently come into contact with the bottle and flask web frameworks, I want to take a look at the source codes of these two, but these two frameworks are based on python's own http, so there is This article;
python http simple example
The python http framework is mainly composed of server and handler. The server is mainly used to build network models, such as using epoll to monitor sockets. ; handler is used to process each ready socket; first let’s look at the simple use of python http:
import sys from http.server import HTTPServer,SimpleHTTPRequestHandler ServerClass = HTTPServer HandlerClass = SimpleHTTPRequestHandler if__name__ =='__main__': port = int(sys.argv[2]) server_address = (sys.argv[1],port) httpd = ServerClass(server_address,HandlerClass) sa=httpd.socket.getsockname() print("Serving HTTP on",sa[0],"port",sa[1],"...") try: httpd.serve_forever() except KeyboardInterrupt: print("\nKeyboard interrupt received, exiting.") httpd.server_close() sys.exit(0)
Run the above example, you can get the following:
python3 myhttp.py 127.0.0.1 9999
At this time, if you create a new index.html file in the current folder, you can access the index through http://127.0.0.1:9999/index.html. html page.
The server class in this example uses HTTPServer, and the handler class is SimpleHTTPRequestHandler. Therefore, when HTTPServer monitors the arrival of a request, it throws the request to the SimpleHTTPRequestHandler class for processing; ok, after understanding this, we start Analyze server and handler respectively.
http server
The design of http module makes full use of object-oriented inheritance polymorphism, because I have read tfs files before System code, so when you look at python http, there is not so much pressure; first give the inheritance relationship of the server
+------------------+ +------------+| tcpserver基类 | | BaseServer +-------->| 开启事件循环监听 | +-----+------+ | 处理客户端请求 | | +------------------+ v +-----------------+ +------------+| httpserver基类 | | TCPServer +-------->+设置监听socket | +-----+------+ | 开启监听 | | +-----------------+ v +------------+ | HTTPServer | +------------+
The inheritance relationship is shown in the figure above, Among them, BaseServer and TCPServer are in the file socketserver.py, and HTTPServer is in http/server.py; let’s look at BaseServer first;
BaseServer
Because BaseServer is the base of all servers class, so BaseServer abstracts the commonalities of all servers as much as possible, such as turning on the event listening loop. This is the commonality of every server, so this is also what BaseServer mainly does; let's take a look at the main code part of BaseServer
defserve_forever(self, poll_interval=0.5): self.__is_shut_down.clear() try: with_ServerSelector()asselector: selector.register(self, selectors.EVENT_READ) whilenotself.__shutdown_request: ready = selector.select(poll_interval) ifready: self._handle_request_noblock() self.service_actions() finally: self.__shutdown_request = False self.__is_shut_down.set()
The selector in the code actually encapsulates the io multiplexing of select, poll, epoll, etc., and then registers the socket monitored by the service itself to the io multiplexing and turns it on Event monitoring, when a client connects, self._handle_request_noblock() will be called to process the request; let’s take a look at what this processing function does;
def_handle_request_noblock(self): try: request, client_address = self.get_request() exceptOSError: return ifself.verify_request(request, client_address): try: self.process_request(request, client_address) except: self.handle_error(request, client_address) self.shutdown_request(request) else: self.shutdown_request(request)
_handle_request_noblock function is an internal function. It first receives the client connection request. The bottom layer actually encapsulates the system call accept function, then verifies the request, and finally calls process_request to process the request; where get_request is a method belonging to the subclass, because tcp It is different from receiving client requests with udp (tcp has connection, udp has no connection)
Let’s take a look at what process_request specifically does;
defprocess_request(self, request, client_address): self.finish_request(request, client_address) self.shutdown_request(request) # ------------------------------------------------- deffinish_request(self, request, client_address): self.RequestHandlerClass(request, client_address, self) defshutdown_request(self, request): self.close_request(request)
The process_request function first calls finish_request to process a connection. After the processing is completed, the shutdown_request function is called to close the connection; the finish_request function instantiates a handler class internally and passes the client's socket and address into it. , explain that the handler class completes the request processing when the initialization is completed. We will take a closer look at this when analyzing the handler later;
The above is what BaseServer does. This BaseServer cannot be used directly because some functions It has not been implemented yet, it is just an abstraction layer for tcp/udp; to summarize:
First call serve_forever to enable event monitoring;
Then when a client request arrives, the request is handed over to the handler for processing;
TCPServer
The functions abstracted by the above-mentioned BaseServer, we can know that the functions that TCPServer or UDPServer should complete are to initialize the listening socket and bind the listening , and finally when there is a client request, receive the client; let’s look at the code
BaseServer==> def__init__(self, server_address, RequestHandlerClass): """Constructor. May be extended, do not override.""" self.server_address = server_address self.RequestHandlerClass = RequestHandlerClass self.__is_shut_down = threading.Event() self.__shutdown_request = False #-------------------------------------------------------------------------------- TCPServer==> def__init__(self, server_address, RequestHandlerClass, bind_and_activate=True): BaseServer.__init__(self, server_address, RequestHandlerClass) self.socket = socket.socket(self.address_family, self.socket_type) ifbind_and_activate: try: self.server_bind() self.server_activate() except: self.server_close() raise
When TCPServer is initialized, it first calls the initialization function of the base class BaseServer. Initialize the server address, handler class, etc., then initialize its own listening socket, and finally call server_bind to bind the socket and server_activate the listening socket
defserver_bind(self): ifself.allow_reuse_address: self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) self.socket.bind(self.server_address) self.server_address = self.socket.getsockname() defserver_activate(self): self.socket.listen(self.request_queue_size)
TCPServer also implements another function, which is to receive client requests.
defget_request(self): returnself.socket.accept()
If you have learned Linux programming before, you should feel very comfortable looking at these codes. Familiar, because the function name is exactly the same as the system call name provided by Linux, I won’t go into details here;
TCPServer has actually built the main framework of the server based on tcp, so HTTPServer just inherits TCPServer. Loaded the server_bind function, set reuse_address, etc.;
ok,这里分析下上述例子程序的开启过程;
httpd = ServerClass(server_address,HandlerClass)这行代码在初始化HTTPServer时,主要是调用基类TCPServer的初始化方法,初始化了监听的套接字,并绑定和监听;
httpd.serve_forever()这行代码调用的是基类BaseServer的serve_forever方法,开启监听循环,等待客户端的连接;
如果有看过redis或者一些后台组件的源码,对这种并发模型应该很熟悉;ok,分析了server之后,接下来看下handler是如何处理客户端请求的。
http之handler
handler类主要分析tcp层的handler和http应用层的handler,tcp层的handler是不能使用的,因为tcp层只负责传输字节,但是并不知对于接收到的字节要如何解析,如何处理等;因此应用层协议如该要使用TCP协议,必须继承TCP handler,然后实现handle函数即可;例如,http层的handler实现handle函数,解析http协议,处理业务请求以及结果返回给客户端;先来看下tcp层的handler
tcp层handler
tcp层handler主要有BaseRequestHandler和StreamRequestHandler(都在socketserver.py文件),先看下BaseRequestHandler代码,
classBaseRequestHandler: def__init__(self, request, client_address, server): self.request = request self.client_address = client_address self.server = server self.setup() try: self.handle() finally: self.finish() defsetup(self): pass defhandle(self): pass deffinish(self): pass
之前在看server时,知道处理客户端请求就是在handler类的初始化函数中完成;由这个基类初始化函数,我们知道处理请求大概经历三个过程:
setup对客户端的socket做一些设置;
handle真正处理请求的函数;
finish关闭socket读写请求;
这个BaseRequestHandler是handler top level 基类,只是抽象出handler整体框架,并没有实际的处理;我们看下tcp handler,
classStreamRequestHandler(BaseRequestHandler): timeout = None disable_nagle_algorithm = False defsetup(self): self.connection = self.request ifself.timeoutisnotNone: self.connection.settimeout(self.timeout) ifself.disable_nagle_algorithm: self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, True) self.rfile = self.connection.makefile('rb', self.rbufsize) self.wfile = self.connection.makefile('wb', self.wbufsize) deffinish(self): ifnotself.wfile.closed: try: self.wfile.flush() exceptsocket.error: pass self.wfile.close() self.rfile.close()
tcp handler实现了setup和finish函数,setup函数设置超时时间,开启nagle算法以及设置socket读写缓存;finish函数关闭socket读写;
由上述两个tcp层的handler可知,要实现一个基于http的服务器handler,只需要继承StreamRequestHandler类,并实现handle函数即可;因此这也是http层handler主要做的事;
http层handler
由之前tcp层handler的介绍,我们知道http层handler在继承tcp层handler基础上,主要是实现了handle函数处理客户端的请求;还是直接看代码吧;
defhandle(self): self.close_connection = True self.handle_one_request() whilenotself.close_connection: self.handle_one_request()
这就是BaseHTTPRequestHandler的handle函数,在handle函数会调用handle_one_request函数处理一次请求;默认情况下是短链接,因此在执行了一次请求之后,就不会进入while循环在同一个连接上处理下一个请求,但是在handle_one_request函数内部会进行判断,如果请求头中的connection为keep_alive或者http版本大于等于1.1,则可以保持长链接;接下来看下handle_one_request函数是如何处理;
defhandle_one_request(self): try: self.raw_requestline =self.rfile.readline(65537) iflen(self.raw_requestline) >65536: self.requestline ='' self.request_version ='' self.command ='' self.send_error(HTTPStatus.REQUEST_URI_TOO_LONG) return ifnotself.raw_requestline: self.close_connection = True return ifnotself.parse_request(): return mname = 'do_'+self.command ifnothasattr(self, mname): self.send_error( HTTPStatus.NOT_IMPLEMENTED, "Unsupported method (%r)"%self.command) return method = getattr(self, mname) method() self.wfile.flush() except socket.timeout as e: self.log_error("Request timed out: %r", e) self.close_connection = True return
这个handle_one_request执行过程如下:
先是调用parse_request解析客户端http请求内容
通过"do_"+command构造出请求所对于的函数method
调用method函数,处理业务并将response返回给客户端
这个BaseHTTPRequestHandler是http handler基类,因此也是无法直接使用,因为它没有定义请求处理函数,即method函数;好在python为我们提供了一个简单的SimpleHTTPRequestHandler,该类继承了BaseHTTPRequestHandler,并实现了请求函数;我们看下get函数:
# SimpleHTTPRequestHandler # --------------------------------------------- defdo_GET(self): """Serve a GET request.""" f = self.send_head() iff: try: self.copyfile(f, self.wfile) finally: f.close()
这个get函数先是调用do_GET函数给客户端返回response头部,并返回请求的文件,最后调用copyfile函数将请求文件通过连接返回给客户端;
以上就是http模块最基础的内容,最后,总结下例子程序handler部分:
server把请求传给SimpleHTTPRequestHandler初始化函数;
SimpleHTTPRequestHandler在初始化部分,对这个客户端connection进行一些设置;
接着调用handle函数处理请求;
在handle函数接着调用handle_one_request处理请求;
在handle_one_request函数内部,解析请求,找到请求处理函数;
我之前的访问属于get访问,因此直接调用do_GET函数将index.html文件返回给客户端;
The analysis of the python http module has ended at this point; I don’t know if you have noticed that the http module that comes with python is not very convenient to use, because it calls the request function through the request method, so when When the same method is called very many times, such as the get and post methods, the request function will be extremely large, making the code difficult to write and difficult to judge in various situations; of course, SimpleHTTPRequestHandler is just a simple example provided by python;
Of course, Python officially provides a more useful framework for http, namely wsgi server and wsgi application; the following article will first analyze the wsgiref module and bottle that come with python, and then analyze flask;
More python comes with Please pay attention to the PHP Chinese website for related articles about the http module!