Home >php教程 >PHP开发 >Detailed explanation of HTTP protocol (really classic)

Detailed explanation of HTTP protocol (really classic)

高洛峰
高洛峰Original
2016-12-12 11:08:071128browse

Introduction

HTTP is an object-oriented protocol belonging to the application layer. Due to its simple and fast way, it is suitable for distributed hypermedia information systems. It was proposed in 1990 and has been continuously improved and expanded after several years of use and development. The sixth version of HTTP/1.0 is currently used in the WWW. The standardization work of HTTP/1.1 is in progress, and the HTTP-NG (Next Generation of HTTP) proposal has been put forward.
The main features of the HTTP protocol can be summarized as follows:
1. Support client/server mode.
2. Simple and fast: When a client requests a service from the server, it only needs to transmit the request method and path. Commonly used request methods are GET, HEAD, and POST. Each method specifies a different type of contact between the client and the server. Due to the simplicity of the HTTP protocol, the program size of the HTTP server is small and the communication speed is very fast.
3. Flexible: HTTP allows the transmission of any type of data object. The type being transferred is marked by Content-Type.
4. No connection: The meaning of no connection is to limit each connection to only process one request. After the server processes the client's request and receives the client's response, it disconnects. This method saves transmission time.
5. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory ability for transaction processing. The lack of status means that if subsequent processing requires the previous information, it must be retransmitted, which may result in an increase in the amount of data transferred per connection. On the other hand, the server responds faster when it does not need previous information.

1. URL Chapter Detailed Explanation of HTTP Protocol

HTTP (Hypertext Transfer Protocol) is a stateless, application layer protocol based on request and response mode, often based on TCP connection method, HTTP version 1.1 A continuous connection mechanism is given in . Most web developments are web applications built on the HTTP protocol.

HTTP URL (URL is a special type of URI that contains enough information to find a resource) has the following format:
http://host[":"port][abs_path]
http means to Locate network resources through the HTTP protocol; host represents a legal Internet host domain name or IP address; port specifies a port number, if it is empty, the default port 80 is used; abs_path specifies the URI of the requested resource; if abs_path is not given in the URL, then When it is used as a request URI, it must be given in the form of "/". Usually the browser does this automatically for us.
eg:
1. Enter: www.guet.edu.cn
The browser automatically converts to: http://www.guet.edu.cn/
2. http:192.168.0.116:8080/index.jsp

2. Request Chapter Detailed Explanation of HTTP Protocol

HTTP request consists of three parts, namely: request line, message header, request body

1. The request line starts with a method symbol, separated by spaces, followed by the request URI and protocol version, the format is as follows: Method Request-URI HTTP-Version CRLF
Method represents the request method; Request-URI is a uniform resource identifier; HTTP-Version represents the requested HTTP protocol version; CRLF represents carriage return and line feed (A single CR or LF character is not allowed except for the trailing CRLF).

There are many request methods (all methods are in uppercase letters). The explanations of each method are as follows:
GET Request to obtain the resource identified by Request-URI
POST Append new data after the resource identified by Request-URI
HEAD Request Get the response message header of the resource identified by Request-URI
PUT Request the server to store a resource and use Request-URI as its identifier
DELETE Request the server to delete the resource identified by Request-URI
TRACE Request the server to send back the received request information , mainly used for testing or diagnosis
CONNECT reserved for future use
OPTIONS request to query the performance of the server, or query the options and requirements related to resources
Application examples:
GET method: access the webpage by entering the URL in the address bar of the browser When the browser uses the GET method to obtain resources from the server, eg: GET /form.html HTTP/1.1 (CRLF)

The POST method requires the requested server to accept the data attached to the request, and is often used to submit forms.
eg:POST /reg.jsp HTTP/ (CRLF)
Accept:image/gif,image/x-xbit,... (CRLF)
...
HOST:www.guet.edu.cn (CRLF)
Content-Length:22 (CRLF)
Connection:Keep-Alive (CRLF)
Cache-Control:no-cache (CRLF)
(CRLF) //This CRLF indicates that the message header has ended, and before that it was the message header
user =jeffrey&pwd=1234 //The following line is the submitted data

The HEAD method is almost the same as the GET method. For the response part of the HEAD request, the information contained in its HTTP header is the same as that obtained through the GET request. The message is the same. Using this method, information about the resource identified by the Request-URI can be obtained without transmitting the entire resource content. This method is often used to test the validity of a hyperlink, whether it is accessible, and whether it has been updated recently.
2. Request header description later
3. Request body (omitted)

3. Response chapter with detailed explanation of HTTP protocol

After receiving and interpreting the request message, the server returns an HTTP response message.

HTTP response is also composed of three parts, namely: status line, message header, response body
1. The status line format is as follows:
HTTP-Version Status-Code Reason-Phrase CRLF
Among them, HTTP-Version represents server HTTP The version of the protocol; Status-Code represents the response status code sent back by the server; Reason-Phrase represents the text description of the status code.
The status code consists of three digits. The first number defines the category of the response and has five possible values:
1xx: Indication information--indicates that the request has been received and continues to be processed
2xx: Success--indicates that the request has been processed Successfully received, understood, accepted
3xx: Redirect--further operation must be performed to complete the request
4xx: Client-side error--the request has a syntax error or the request cannot be fulfilled
5xx: Server-side error--the server failed to fulfill it Legal request
Common status code, status description, description:
200 OK //Client request is successful
400 Bad Request //The client request has a syntax error and cannot be understood by the server
401 Unauthorized //The request is not authorized, This status code must be used together with the WWW-Authenticate header field
403 Forbidden //The server received the request, but refused to provide the service
404 Not Found //The requested resource does not exist, eg: the wrong URL was entered
500 Internal Server Error / /An unexpected error occurred in the server
503 Server Unavailable //The server is currently unable to process the client's request and may return to normal after a period of time
eg: HTTP/1.1 200 OK (CRLF)

2. The response header is described later

3. The response text is the content of the resource returned by the server

4. Detailed Explanation of HTTP Protocol - Message Header

HTTP message consists of a request from the client to the server and a response from the server to the client. Both request messages and response messages consist of a start line (for a request message, the start line is the request line, for a response message, the start line is the status line), a message header (optional), a blank line (a line with only CRLF), and the message body (optional) composition.

HTTP message headers include ordinary headers, request headers, response headers, and entity headers.
Each header field is composed of name + ":" + space + value. The name of the message header field is case-independent.

1. Ordinary headers
In ordinary headers, there are a few header fields that are used for all request and response messages, but are not used for the entities being transmitted, only for the transmitted messages.
eg:
Cache-Control is used to specify cache instructions. The cache instructions are one-way (the cache instructions that appear in the response may not appear in the request) and are independent (the cache instructions of one message will not affect another message) Caching mechanism for processing), a similar header field used by HTTP 1.0 is Pragma.
Caching directives when requesting include: no-cache (used to indicate that request or response messages cannot be cached), no-store, max-age, max-stale, min-fresh, only-if-cached;
Caching when responding Instructions include: public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, max-age, s-maxage.
eg: In order to instruct IE browser (client) not to cache the page , the server-side JSP program can be written as follows: response.sehHeader("Cache-Control","no-cache");
//response.setHeader("Pragma","no-cache");The function is equivalent to the above code, Usually both // are used together
This code will set the common header field in the response message sent: Cache-Control: no-cache


Date common header field indicates the date and time when the message is generated

Connection common header field allows Send options for the specified connection. For example, specify that the connection is continuous, or specify the "close" option to notify the server to close the connection after the response is completed

2. Request header
The request header allows the client to pass additional information of the request and the client's own information to the server.
Commonly used request headers
Accept
The Accept request header field is used to specify what types of information the client accepts. eg: Accept: image/gif, indicating that the client wishes to accept resources in GIF image format; Accept: text/html, indicating that the client wishes to accept html text.
Accept-Charset
The Accept-Charset request header field is used to specify the character set accepted by the client. eg: Accept-Charset:iso-8859-1, gb2312. If this field is not set in the request message, the default is that any character set is acceptable.
Accept-Encoding
The Accept-Encoding request header field is similar to Accept, but it is used to specify acceptable content encoding. eg: Accept-Encoding:gzip.deflate. If this domain is not set in the request message, the server assumes that the client can accept various content encodings.
Accept-Language
The Accept-Language request header field is similar to Accept, but it is used to specify a natural language. eg: Accept-Language:zh-cn. If this header field is not set in the request message, the server assumes that the client can accept various languages.
Authorization
The Authorization request header field is mainly used to prove that the client has the right to view a certain resource. When the browser accesses a page and receives a response code of 401 (Unauthorized) from the server, it can send a request containing the Authorization request header field to ask the server to verify it.
Host (this header field is required when sending a request)
Host request header field is mainly used to specify the Internet host and port number of the requested resource. It is usually extracted from the HTTP URL, eg:
We use the browser Enter: http://www.guet.edu.cn/index.html
The request message sent by the browser will contain the Host request header field, as follows:
Host: www.guet.edu.cn
Here Use the default port number 80. If the port number is specified, it becomes: Host: www.guet.edu.cn: Specify the port number
User-Agent
When we log in to the forum online, we often see some welcome messages. It lists the name and version of your operating system and the name and version of the browser you are using. This often makes many people feel amazing. In fact, the server application obtains it from the User-Agent request header field. to this information. The User-Agent request header field allows the client to tell the server its operating system, browser, and other attributes. However, this header field is not necessary. If we write a browser ourselves and do not use the User-Agent request header field, then the server will not be able to know our information.
Request header example:
GET /form.html HTTP/1.1 (CRLF)
Accept:image/gif,image/x-xbitmap,image/jpeg,application/x-shockwave-flash,application/vnd.ms-excel, application/vnd.ms-powerpoint,application/msword,*/* (CRLF)
Accept-Language:zh-cn (CRLF)
Accept-Encoding:gzip,deflate (CRLF)
If-Modified-Since:Wed,05 Jan 2007 11:21:25 GMT (CRLF)
If-None-Match:W/"80b1a4c018f3c41:8317" (CRLF)
User-Agent:Mozilla/4.0(compatible;MSIE6.0;Windows NT 5.0) (CRLF)
Host:www.guet.edu.cn (CRLF)
Connection:Keep-Alive (CRLF)
(CRLF)

3. Response header
The response header allows the server to pass additional response information that cannot be placed in the status line, and Information about the server and information about further access to the resource identified by the Request-URI.
Commonly used response headers
Location
The Location response header field is used to redirect the recipient to a new location. The Location response header field is often used when changing domain names.
Server
The Server response header field contains information about the software used by the server to process the request. Corresponds to the User-Agent request header field. The following is an example of the
Server response header field:
Server: Apache-Coyote/1.1
WWW-Authenticate
WWW-Authenticate response header field must be included in the 401 (Unauthorized) response message, the client receives a 401 response message and sends the Authorization header field to request the server to verify it, the server response header will contain this header field.
eg: WWW-Authenticate:Basic realm="Basic Auth Test!" //It can be seen that the server uses a basic verification mechanism for requesting resources.


4. Entity header
Both request and response messages can transmit an entity. An entity consists of an entity header field and an entity body. However, this does not mean that the entity header field and the entity body must be sent together. Only the entity header field can be sent. The entity header defines meta-information about the entity body (eg: presence or absence of an entity body) and the resource identified by the request.
Commonly used entity headers
Content-Encoding
Content-Encoding entity header field is used as a modifier of the media type. Its value indicates the encoding of additional content that has been applied to the entity body, so the Content-Type header field is obtained The media types referenced in must use the corresponding decoding mechanism. Content-Encoding is used to record the compression method of the document, eg: Content-Encoding: gzip
Content-Language
Content-Language entity header field describes the natural language used by the resource. If this field is not set, it is assumed that the entity content will be available to readers in all languages. eg: Content-Language:da
Content-Length
Content-Length entity header field is used to indicate the length of the entity body, expressed as a decimal number stored in bytes.
Content-Type
Content-Type entity header field terms indicate the media type of the entity body sent to the recipient. eg:
Content-Type: text/html; charset=ISO-8859-1
Content-Type: text/html; charset=GB2312
Last-Modified
Last-Modified entity header field is used to indicate the last modified date of the resource and time.
Expires
The Expires entity header field gives the date and time when the response expires. In order to allow the proxy server or browser to update the page in the cache after a period of time (when accessing the previously visited page again, load it directly from the cache, shorten the response time and reduce the server load), we can use the Expires entity header field to specify the page Expiration time. eg: Expires: Thu, 15 Sep 2006 16:23:12 GMT
Clients and caches of HTTP 1.1 must treat other illegal date formats (including 0) as having expired. eg: In order to prevent the browser from caching the page, we can also use the Expires entity header field and set it to 0. The program in jsp is as follows: response.setDateHeader("Expires","0");

5. Use telnet to observe Communication process of http protocol

Experiment purpose and principle:

Use MS’s telnet tool to send a request to the server by manually inputting http request information. After the server receives, interprets and accepts the request, it will return a response, which will Display it on the telnet window, thereby deepening your understanding of the communication process of the http protocol from a perceptual perspective.

Experimental steps:

1. Open telnet

1.1 Open telnet
Run -->cmd-->telnet

1.2 Turn on telnet echo function

set localecho

2. Connect to the server and send a request

2.1 open www.guet.edu.cn 80 //Note that the port number cannot be omitted

HEAD /index.asp HTTP/1.0

Host:www.guet.edu.cn

/*We can change the request method and request the content of Guilin Electronics homepage , enter the message as follows*/
open www.guet.edu.cn 80

GET /index.asp HTTP/1.0 //The content of the requested resource
Host:www.guet.edu.cn

2.2 open www.sina. com.cn 80 //Input telnet directly at the command prompt www.sina.com.cn 80

HEAD /index.asp HTTP/1.0
Host:www.sina.com.cn

3 Experimental results:

3.1 The response obtained by requesting information 2.1 is:

HTTP/1.1 200 OK                                        //web server

Date: Thu,08 Mar 200707:17:51 GMT
Connection: Keep- Live JKLKKAJEOIMMH; path=/
Cache-control: private

/ /Resource content omitted

3.2 The response obtained by requesting information 2.2 is:

HTTP/1.0 404 Not Found //Request failed
Date: Thu, 08 Mar 2007 07:50:50 GMT
Server: Apache/2.0.54
Last-Modified: Thu, 30 Nov 2006 11:35 :41 GMT
ETag: "6277a-415-e7c76980"
Accept-Ranges: bytes
X-Powered-By: mod_xlayout_jh/0.0.1vhs.markII.remix
Vary: Accept-Encoding
Content-Type: text/html
X-Cache: MISS from zjm152-78.sina.com.cn
Via: 1.0 zjm152-78.sina.com.cn:80
X-Cache: MISS from th-143. sina.com.cn
Connection: close


Lost the connection with the host

Press any key to continue...

4. Notes: 1. If there is an input error, the request will not be successful.
          2. The header fields are not case-sensitive.
3. To learn more about the HTTP protocol, you can view RFC2616 and find the document at http://www.letf.org/rfc.
        4. To develop background programs, you must master the http protocol

6. Technical supplements related to HTTP protocol

1. Basics:
High-level protocols include: File Transfer Protocol FTP, Email Transfer Protocol SMTP, Domain Name System Service DNS, Network News Transfer Protocol NNTP and HTTP protocols, etc.
There are three types of intermediaries: Proxy, Gateway And channel (Tunnel), a proxy accepts requests according to the absolute format of the URI, rewrites all or part of the message, and sends the formatted request to the server through the URI identifier. A gateway is a receiving proxy that acts as a layer above some other server and, if necessary, can translate requests to the underlying server protocol. A channel acts as a relay point between two connections that do not change messages. Channels are often used when communication needs to go through an intermediary (such as a firewall, etc.) or when the intermediary cannot identify the content of the message.
Proxy: An intermediate program that can act as a server or a client to establish requests for other clients. Requests are passed internally or via other servers via possible translations. A proxy must interpret and if possible rewrite a request message before sending it. A proxy often acts as a portal for clients through a firewall. A proxy can also serve as a helper application to handle requests over a protocol that are not completed by the user agent.
Gateway: A server that acts as an intermediary for other servers. Unlike a proxy, a gateway accepts requests as if it were the origin server for the requested resource; the requesting client is unaware that it is dealing with the gateway.
A gateway often acts as a server-side portal through a firewall. A gateway can also act as a protocol translator to access resources stored in non-HTTP systems.
Channel (Tunnel): It is an intermediary program that acts as a relay between two connections. Once activated, the channel is not considered to belong to HTTP communication, although the channel may be initiated by an HTTP request. When both ends of the relayed connection are closed, the channel disappears. Channels are often used when a portal must exist or when an intermediary cannot interpret the relayed traffic.

2. Advantages of protocol analysis - HTTP analyzer detects network attacks
Analyzing and processing high-level protocols in a modular manner will be the direction of future intrusion detection.
Commonly used ports 80, 3128 and 8080 of HTTP and its proxy are specified with the port tag in the network section

3. HTTP protocol Content Lenth restriction vulnerability leads to denial of service attack
When using the POST method, you can set ContentLenth to define the content that needs to be transmitted Data length, such as ContentLenth:999999999, the memory will not be released before the transmission is completed. An attacker can take advantage of this flaw to continuously send garbage data to the WEB server until the WEB server memory is exhausted. This attack method leaves basically no trace.
http://www.cnpaf.net/Class/HTTP/0532918532667330.html

4. Some ideas of using the characteristics of HTTP protocol to carry out denial of service attacks
The server is busy processing the TCP connection request forged by the attacker and has no time to pay attention to the client of normal requests (after all, the client's normal request ratio is very small). At this time, from the perspective of a normal client, the server loses response. This situation is called: the server is subject to a SYNFlood attack (SYN flood attack).
Smurf, TearDrop, etc. use ICMP messages to carry out Flood and IP fragmentation attacks. This article uses the "normal connection" method to generate a denial of service attack.
Port 19 has been used for Chargen attacks in the early days, namely Chargen_Denial_of_Service, but! The method they used was to generate a UDP connection between two Chargen servers, allowing the server to process too much information and become DOWN. Then, there must be two conditions for killing a WEB server: 1. There is a Chargen service 2. There is HTTP Service
method: The attacker forges the source IP and sends a connection request (Connect) to N Chargens. After Chargen receives the connection, it will return a 72-byte character stream per second (actually, this speed is faster according to the actual network conditions) to server.

5. Http fingerprinting technology
The principle of Http fingerprinting is basically the same: recording the slight differences in the execution of the HTTP protocol by different servers to identify. Http fingerprinting is much more complicated than TCP/IP stack fingerprinting, the reason is customization Http server configuration files, adding plug-ins or components make it easy to change Http response information, which makes identification difficult; however, customizing the behavior of the TCP/IP stack requires modifications to the core layer, so it is easy to identify.
           It is very simple to set up the server to return different Banner information. For open source Http servers like Apache, users can modify the Banner information in the source code, and then restart the Http service to take effect; for those that do not have open source code, Http servers, such as Microsoft's IIS or Netscape, can be modified in the Dll file that stores Banner information. Related articles have discussed it, so I won't go into details here. Of course, the effect of such modifications is still good. Another method of blurring Banner information The way to do this is to use plugins.
Commonly used test requests:
1: HEAD/Http/1.0 sends basic Http requests
2: DELETE/Http/1.0 sends those requests that are not allowed, such as Delete requests
3: GET/Http/3.0 sends an illegal version of the HTTP protocol request
4: GET/JUNK/1.0 sends an HTTP protocol request with incorrect specifications
Http fingerprint identification tool Httprint, which uses statistical principles to combine fuzzy logic Learning technology can effectively determine the type of HTTP server. It can be used to collect and analyze signatures generated by different HTTP servers.

6. Others: In order to improve the performance of users when using the browser, modern browsers also support concurrent access methods. When browsing a web page, multiple connections are established at the same time to quickly obtain multiple icons on a web page, which can be more convenient. Complete the transfer of the entire web page quickly.
HTTP1.1 provides this continuous connection method, and the next generation of HTTP protocol: HTTP-NG adds support for session control, rich content negotiation and other methods to provide
more efficient connections.


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn