Home >Common Problem >Detailed explanation of HTTP protocol

Detailed explanation of HTTP protocol

藏色散人forward: 2019-12-04 11:07:078839browse

Introduction

HTTP is an object-oriented protocol belonging to the application layer. Due to its simple and fast way, it is suitable for distributed hypermedia information systems. It was proposed in 1990 and has been continuously improved and expanded after several years of use and development. The sixth version of HTTP/1.0 is currently used in the WWW. The standardization work of HTTP/1.1 is in progress, and the HTTP-NG (Next Generation of HTTP) proposal has been put forward.

The main features of the HTTP protocol can be summarized as follows:

1. Support client/server mode.

2. Simple and fast: When the client requests services from the server, it only needs to transmit the request method and path. Commonly used request methods are GET, HEAD, and POST. Each method specifies a different type of contact between the client and the server. Due to the simplicity of the HTTP protocol, the program size of the HTTP server is small and the communication speed is very fast.

3. Flexible: HTTP allows the transmission of any type of data object. The type being transferred is marked by Content-Type.

4. No connection: The meaning of no connection is to limit each connection to only process one request. After the server processes the client's request and receives the client's response, it disconnects. This method saves transmission time.

5. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory ability for transaction processing. The lack of status means that if subsequent processing requires the previous information, it must be retransmitted, which may result in an increase in the amount of data transferred per connection. On the other hand, the server responds faster when it does not need previous information.

1. URL Chapter Detailed Explanation of HTTP Protocol

http (Hypertext Transfer Protocol) is a protocol based on request and A response-mode, stateless, application layer protocol, often based on the TCP connection method. HTTP version 1.1 provides a continuous connection mechanism. The vast majority of web development is built on the HTTP protocol. Web application.

HTTP URL (URL is a special type of URI that contains enough information to find a resource) has the following format:

http://host[":"port][abs_path]

http means to locate it through the HTTP protocol Network resources; host represents a legal Internet host domain name or IP address; port specifies a port number, if it is empty, the default port 80 is used; abs_path specifies the URI of the requested resource; if abs_path is not given in the URL, then it is used as the request URI When, it must be given in the form of "/". Usually the browser automatically completes this task for us.

eg:

1. Enter: www.guet.edu.cn

The browser automatically converts to: http://www.guet.edu.cn/

2. http:192.168.0.116:8080/index.jsp

2. Detailed explanation of HTTP protocol Request Chapter

http request consists of three parts, namely: request line, message header, request body

1. The request line starts with a method symbol Beginning, separated by spaces, followed by the requested URI and protocol version, the format is as follows: Method Request-URI HTTP-Version CRLF

Method represents the request method; Request-URI is a uniform resource identifier; HTTP -Version indicates the requested HTTP protocol version; CRLF indicates carriage return and line feed (except for the trailing CRLF, no separate CR or LF characters are allowed).

There are many request methods (all methods are in uppercase letters). The explanations of each method are as follows:

GET Request to obtain the resource identified by Request-URI
POST Appends new data after the resource identified by Request-URI
HEAD Requests to obtain the response message header of the resource identified by Request-URI
PUT Requests the server to store a resource and uses Request -URI as its identifier
DELETE Requests the server to delete the resource identified by Request-URI
TRACE Requests the server to send back the received request information, mainly used for testing or diagnosis
CONNECT reserves future use of
OPTIONS requests to query the server's performance, or query resource-related options and requirements

Application examples:

GET method: When accessing a webpage by entering a URL in the browser's address bar, the browser uses the GET method to obtain resources from the server, eg: GET /form.html HTTP/1.1 (CRLF)

POST method requirements The requested server accepts data attached to the request, often used to submit forms.

eg：POST /reg.jsp HTTP/ (CRLF)
Accept:image/gif,image/x-xbit,... (CRLF)
...
HOST:www.guet.edu.cn (CRLF)
Content-Length:22 (CRLF)
Connection:Keep-Alive (CRLF)
Cache-Control:no-cache (CRLF)
(CRLF) //This CRLF indicates that the message header has ended, and before that it was the message header
user=jeffrey&pwd =1234 //The following line is the submitted data

The HEAD method is almost the same as the GET method. For the response part of the HEAD request, the information contained in its HTTP header is the same as the information obtained through the GET request. Using this method, information about the resource identified by the Request-URI can be obtained without transmitting the entire resource content. This method is often used to test the validity of a hyperlink, whether it is accessible, and whether it has been updated recently.

2. Request header description

3. Request body (omitted)

3. Response chapter with detailed explanation of HTTP protocol

After receiving and interpreting the request message, the server returns an HTTP response message.

HTTP response also consists of three parts, namely: status line, message header, response body

1. The format of the status line is as follows:

HTTP- Version Status-Code Reason-Phrase CRLF

Among them, HTTP-Version represents the version of the server HTTP protocol; Status-Code represents the response status code sent back by the server; Reason-Phrase represents the text description of the status code .

The status code consists of three digits. The first digit defines the type of response and has five possible values:

1xx: Indication information-indicates that the request has been received , continue processing

2xx: Success--Indicates that the request has been successfully received, understood, and accepted

3xx: Redirect--Further operations must be performed to complete the request

4xx: Client error--The request has a syntax error or the request cannot be fulfilled

5xx: Server-side error--The server failed to fulfill a legal request

Common status codes, status Description and explanation:

200 OK //The client request is successful

400 Bad Request //The client request has a syntax error and cannot be understood by the server

401 Unauthorized //The request is unauthorized, this status code must be used together with the WWW-Authenticate report //header field

403 Forbidden //The server received the request, but refused to provide the service

404 Not Found //The requested resource does not exist, eg: the wrong URL was entered

500 Internal Server Error //An unexpected error occurred in the server

503 Server Unavailable //The server cannot currently process the customer After a period of time, it may return to normal after a period of time . The response text is the content of the resource returned by the server

4. Detailed explanation of the HTTP protocol - message header

HTTP message from client to server consists of a request and a response from the server to the client. Both request messages and response messages consist of a start line (for a request message, the start line is the request line, for a response message, the start line is the status line), a message header (optional), a blank line (a line with only CRLF), and the message body (optional) composition.

HTTP message headers include ordinary headers, request headers, response headers, and entity headers.

Each header field is composed of a name ":" space value. The name of the message header field is case-independent.

1. Ordinary headers

In ordinary headers, there are a few header fields that are used for all request and response messages, but are not used for the transmitted entities, only for the transmitted messages.

eg:

Cache-Control is used to specify cache instructions. The cache instructions are one-way (the cache instructions that appear in the response may not appear in the request) and are independent (a The caching directive of a message does not affect the caching mechanism of another message processing). A similar header field used by HTTP 1.0 is Pragma.

Cache directives when requesting include: no-cache (used to indicate that request or response messages cannot be cached), no-store, max-age, max-stale, min-fresh, only-if-cached;

The caching directives when responding include: public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, max-age, s-maxage.

eg: In order to instruct the IE browser (client) not to cache the page, the server-side JSP program can be written as follows: response.sehHeader("Cache-Control", "no-cache");

//response .setHeader("Pragma","no-cache");The function is equivalent to the above code, usually both //are used together

This code will set the common header field in the response message sent: Cache-Control :no-cache

Date common header field indicates the date and time when the message was generated

Connection common header field allows options to be sent for a specified connection. For example, specify that the connection is continuous, or specify the "close" option to notify the server to close the connection after the response is completed.

2. Request header

The request header allows the client to pass the request to the server. Additional information as well as information about the client itself.

Commonly used request headers

The Accept request header field is used to specify what types of information the client accepts. eg: Accept: image/gif, indicating that the client wants to accept resources in GIF image format; Accept: text/html, indicating that the client wants to accept html text.

Accept-Charset

The Accept-Charset request header field is used to specify the character set accepted by the client. eg: Accept-Charset:iso-8859-1, gb2312. If this field is not set in the request message, the default is that any character set is acceptable.

Accept-Encoding

The Accept-Encoding request header field is similar to Accept, but it is used to specify acceptable content encoding. eg: Accept-Encoding:gzip.deflate. If this domain is not set in the request message, the server assumes that the client can accept various content encodings.

Accept-Language

The Accept-Language request header field is similar to Accept, but it is used to specify a natural language. eg: Accept-Language:zh-cn. If this header field is not set in the request message, the server assumes that the client can accept various languages.

Authorization

The Authorization request header field is mainly used to prove that the client has the right to view a resource. When the browser accesses a page and receives a response code of 401 (Unauthorized) from the server, it can send a request containing the Authorization request header field to ask the server to verify it.

Host (this header field is required when sending a request)

Host request header field is mainly used to specify the Internet host and port number of the requested resource, which is usually extracted from the HTTP URL comes out, eg:

We enter in the browser: http://www.guet.edu.cn/index.html

The request message sent by the browser will contain Host request header field, as follows:

Host: www.guet.edu.cn

The default port number 80 is used here. If the port number is specified, it becomes: Host: www .guet.edu.cn: Specify the port number

User-Agent

When we log in to the forum online, we often see some welcome messages, which list the name of your operating system. and version, the name and version of the browser you are using, which often makes many people feel amazing. In fact, the server application obtains this information from the User-Agent request header field. The User-Agent request header field allows the client to tell the server its operating system, browser, and other attributes. However, this header field is not necessary. If we write a browser ourselves and do not use the User-Agent request header field, then the server will not be able to know our information.

Request header example:

GET /form.html HTTP/1.1 (CRLF)

Accept:image/gif,image/x-xbitmap,image/ jpeg,application/x-shockwave-flash,application/vnd.ms-excel,application/vnd.ms-powerpoint,application/msword,*/* (CRLF)

Accept-Language:zh-cn ( CRLF)

Accept-Encoding:gzip,deflate (CRLF)

If-Modified-Since:Wed,05 Jan 2007 11:21:25 GMT (CRLF)

If-None-Match:W/"80b1a4c018f3c41:8317" (CRLF)

User-Agent:Mozilla/4.0(compatible;MSIE6.0;Windows NT 5.0) (CRLF)

Host :www.guet.edu.cn (CRLF)

Connection:Keep-Alive (CRLF)

(CRLF)

3. Response header

The response header allows the server to pass additional response information that cannot be placed in the status line, as well as information about the server and information about next access to the resource identified by the Request-URI.

Commonly used response headers

Location

The Location response header field is used to redirect the recipient to a new location. The Location response header field is often used when changing domain names.

Server

The Server response header field contains information about the software used by the server to process the request. Corresponds to the User-Agent request header field. The following is an example of the

Server response header field:

Server:Apache-Coyote/1.1

WWW-Authenticate

WWW-Authenticate response header field Must be included in the 401 (Unauthorized) response message. When the client receives the 401 response message and sends the Authorization header field to request the server to verify it, the server response header contains this header field.

eg: WWW-Authenticate:Basic realm="Basic Auth Test!" //It can be seen that the server uses a basic authentication mechanism for requested resources.

4. Entity header

Both request and response messages can transmit an entity. An entity consists of an entity header field and an entity body. However, this does not mean that the entity header field and the entity body must be sent together. Only the entity header field can be sent. The entity header defines meta-information about the entity body (eg: presence or absence of an entity body) and the resource identified by the request.

Commonly used entity headers

Content-Encoding

The Content-Encoding entity header field is used as a media type modifier, and its value indicates that it has been applied to the entity. The encoding of additional content in the body, so that to obtain the media type referenced in the Content-Type header field, a corresponding decoding mechanism must be used. Content-Encoding is used to record the compression method of the document, eg: Content-Encoding: gzip

Content-Language

The Content-Language entity header field describes the natural language used by the resource. If this field is not set, it is assumed that the entity content will be available to readers in all languages. eg: Content-Language:da

Content-Length

The Content-Length entity header field is used to indicate the length of the entity body, expressed as a decimal number stored in bytes.

Content-Type

The Content-Type entity header field specifies the media type of the entity body sent to the recipient. eg:

Content-Type:text/html;charset=ISO-8859-1

Content-Type:text/html;charset=GB2312

Last-Modified

The Last-Modified entity header field is used to indicate the last modified date and time of the resource.

Expires

The Expires entity header field gives the date and time when the response expires. In order to allow the proxy server or browser to update the page in the cache after a period of time (when accessing the previously visited page again, load it directly from the cache, shorten the response time and reduce the server load), we can use the Expires entity header field to specify the page Expiration time. eg: Expires: Thu, 15 Sep 2006 16:23:12 GMT

Clients and caches of HTTP 1.1 MUST treat other illegal date formats (including 0) as having expired. eg: In order to prevent the browser from caching the page, we can also use the Expires entity header field and set it to 0. The program in jsp is as follows: response.setDateHeader("Expires","0");

5. Use telnet to observe the communication process of the http protocol

The purpose and principle of the experiment:

Use the MS telnet tool to manually Enter the method of http request information and send a request to the server. After the server receives, interprets and accepts the request, it will return a response, which will be displayed on the telnet window, thus deepening the understanding of the communication process of the http protocol from a perceptual perspective.

Experimental steps:

1. Open telnet

1.1 Open telnet

Run-->cmd-->telnet

1.2 Turn on telnet echo function

set localecho

2. Connect to the server and send a request

2.1 open www.guet.edu.cn 80 //Note that the port number cannot be Omit

HEAD /index.asp HTTP/1.0

Host:www.guet.edu.cn

/*We can change the request method, To request the content of Guilin Electronics homepage, enter the following message */

open www.guet.edu.cn 80

GET /index.asp HTTP/1.0 //Request resources Content

Host:www.guet.edu.cn

2.2 open www.sina.com.cn 80 //Input telnet www.sina.com.cn directly at the command prompt 80

HEAD /index.asp HTTP/1.0

Host:www.sina.com.cn

3 Experimental results:

3.1 The response obtained by requesting information 2.1 is:

HTTP/1.1 200 OK //web server

Date : Thu,08 Mar 200707:17:51 GMT

Connection: Keep-Alive

Content-Length: 23330

Content-Type: text/html

Expries: Thu,08 Mar 2007 07:16:51 GMT

Set-Cookie: ASPSESSIONIDQAQBQQQB=BEJCDGKADEDJKLKKAJEOIMMH; path=/

#Cache-control: private

/ /Resource content omitted

3.2 The response obtained by requesting information 2.2 is:

HTTP/1.0 404 Not Found //The request failed

Date: Thu, 08 Mar 2007 07: 50:50 GMT

Server: Apache/2.0.54 02419e33db738d718d29f3031ae9bb20

Last-Modified: Thu, 30 Nov 2006 11:35:41 GMT

ETag: "6277a-415-e7c76980"

Accept-Ranges: bytes

X-Powered-By: mod_xlayout_jh/0.0.1vhs.markII.remix

Vary: Accept-Encoding

Content-Type: text/html

X-Cache: MISS from zjm152-78.sina.com.cn

Via: 1.0 zjm152-78.sina.com .cn:80d5f86657af72c1913e9b8af081e6b15f

X-Cache: MISS from th-143.sina.com.cn

Connection: close

lost The connection with the host is established

Press any key to continue...

4. Notes: 1. If there is an input error, the request will not be successful.

2. The header field is not case-sensitive.

3. To learn more about the HTTP protocol, you can view RFC2616 and find the file at http://www.letf.org/rfc.

4. To develop background programs, you must master the http protocol

6. Technical supplements related to HTTP protocol

1. Basics:

High-level protocols include: File Transfer Protocol FTP, Email Transfer Protocol SMTP, Domain Name System Service DNS, Network News Transfer Protocol NNTP and HTTP protocols, etc.

There are three types of intermediaries: Proxy , Gateway and Tunnel, a proxy accepts requests according to the absolute format of the URI, rewrites all or part of the message, and sends the formatted request to the server through the URI identifier. A gateway is a receiving proxy that acts as a layer above some other server and, if necessary, can translate requests to the underlying server protocol. A channel acts as a relay point between two connections that do not change messages. Channels are often used when communication needs to go through an intermediary (such as a firewall, etc.) or when the intermediary cannot identify the content of the message.

Proxy: An intermediate program that can act as a server or a client to establish requests for other clients. Requests are passed internally or via other servers via possible translations. A proxy must interpret and if possible rewrite a request message before sending it. A proxy often acts as a portal for clients through a firewall. A proxy can also serve as a helper application to handle requests over a protocol that are not completed by the user agent.

Gateway: A server that acts as an intermediary for other servers. Unlike a proxy, a gateway accepts requests as if it were the origin server for the requested resource; the requesting client is unaware that it is dealing with the gateway.

Gateways often serve as server-side portals through firewalls. Gateways can also serve as a protocol translator to access resources stored in non-HTTP systems.

Channel (Tunnel): It is an intermediary program that acts as a relay between two connections. Once activated, the channel is not considered to belong to HTTP communication, although the channel may be initiated by an HTTP request. When both ends of the relayed connection are closed, the channel disappears. Channels are often used when a portal must exist or when an intermediary cannot interpret the relayed traffic.

2. Advantages of protocol analysis - HTTP analyzer detects network attacks

Analyzing and processing high-level protocols in a modular manner will be the direction of future intrusion detection.

Commonly used ports 80, 3128 and 8080 of HTTP and its proxy are specified in the network section using the port tag

3. HTTP protocol Content Lenth restriction vulnerability leads to denial of service attack

When using the POST method, you can set ContentLenth to define the length of data to be transmitted, for example, ContentLenth:999999999. The memory will not be released before the transmission is completed. Attackers can use this flaw to continuously send junk data to the WEB server until the WEB server memory exhausted. This attack method leaves basically no trace.

http://www.cnpaf.net/Class/HTTP/0532918532667330.html

4. Some ideas for using the characteristics of HTTP protocol to carry out denial of service attacks

Server The client is busy processing the TCP connection request forged by the attacker and has no time to pay attention to the client's normal request (after all, the client's normal request ratio is very small). At this time, from the perspective of the normal client, the server loses response. This situation is called: The server side is subject to SYNFlood attack (SYN flood attack).

Smurf, TearDrop, etc. use ICMP messages to carry out Flood and IP fragmentation attacks. This article uses the "normal connection" method to generate a denial of service attack.

Port 19 has been used for Chargen attacks in the early days, namely Chargen_Denial_of_Service, but! The method they use is to generate a UDP connection between two Chargen servers, allowing the server to process too much information and become DOWN. Then, there must be two conditions for killing a WEB server: 1. There is a Chargen service 2. There is HTTP Service

Method: The attacker forges the source IP and sends a connection request (Connect) to N stations Chargen. After receiving the connection, Chargen will return a 72-byte character stream per second (actually, according to the actual network situation, this faster) to the server.

5. Http Fingerprinting Technology

The principle of Http fingerprinting is basically the same: recording the slight differences in the execution of the Http protocol by different servers to identify. Http fingerprinting is better than TCP/IP Stack fingerprinting is much more complicated. The reason is that customizing the configuration file of the HTTP server and adding plug-ins or components make it easy to change the HTTP response information, which makes identification difficult; however, customizing the behavior of the TCP/IP stack requires the core layer to be modified. Modification, so it is easy to identify.

It is very simple to set up the server to return different Banner information. For open source Http servers like Apache, users can modify the Banner information in the source code, and then Restarting the Http service will take effect; for Http servers that do not have open source code, such as Microsoft's IIS or Netscape, you can modify it in the Dll file that stores Banner information. Related articles have discussed it, so I won't go into details here. Of course, this is the case. The effect of the modification is still good. Another way to blur Banner information is to use a plug-in.

Commonly used test requests:

1: HEAD/Http/1.0 sends basic Http requests

2: DELETE/Http/1.0 sends those requests that are not allowed, such as Delete request

3: GET/Http/3.0 sends an illegal version of the Http protocol request

4: GET/JUNK/1.0 sends an incorrect specification of the Http protocol request

Http fingerprint identification tool Httprint, which uses statistical principles and combines fuzzy logic technology, can effectively determine the type of HTTP server. It can be used to collect and analyze signatures generated by different HTTP servers.

6. Others: In order to improve the performance of users when using the browser, modern browsers also support concurrent access methods. When browsing a web page, multiple connections are established at the same time to quickly obtain multiple icons on a web page. , which can complete the transmission of the entire web page more quickly.

HTTP1.1 provides this continuous connection method, and the next generation of HTTP protocol: HTTP-NG has added support for session control, rich content negotiation and other methods to provide

More efficient connections.

The above is the detailed content of Detailed explanation of HTTP protocol. For more information, please follow other related articles on the PHP Chinese website!

Statement：

This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete

Previous article：What is WSDLNext article：What is WSDL

See more