Home >Operation and Maintenance >Linux Operation and Maintenance >Introduction and in-depth understanding of HTTP protocol

Introduction and in-depth understanding of HTTP protocol

巴扎黑
巴扎黑Original
2017-08-23 15:56:161896browse

Summarizes my understanding of some content related to http protocol that I encountered in actual work scenarios.

Introduction and in-depth understanding of HTTP protocol

Request & Response

Request format

For example: GET /api/index.json HTTP/1.1

For example: Accept: */*; User-Agent: Mozilla/4.0;……

[] For example: id=1×tamp=xxxxxx

Response format

For example: HTTP/1.1 200 OK

For example: Content-Type: application/json;……

[] For example: {"id": 1,"username":"testuser"}

Status Code

There are nearly 60 http status codes. I mainly record some common status codes generated under abnormal circumstances. We will encounter it more or less in daily applications, which helps us understand and discover problems.

206 - Used when downloading with breakpoints. The client requested a part of the content and the server successfully returned this part of the content to it. This status is used at this time.

301 - Permanent jump, the original address no longer exists, and the url is pointed to another address. This is mainly related to search engines and affects the crawler's retrieval behavior.

302 - Temporary jump, the server will return a new URL to the client, and the client can continue to access this URL to obtain content.

304 - The resource has not changed and the client can use locally cached content, which is common for static content access.

413 - The request entity is too large. A common situation is to upload a large file, but exceed the server (such as nginx) limit. Or the request header or request body exceeds the settings of the back-end server (such as tomcat) (for example, there are too many cookies under the current domain name, exceeding the request header limit)

416 - Related to breakpoint resumption, client request The range exceeds the file size on the server.

500 - Internal server error and cannot return normal results. For example, the most common application throws a null pointer exception that is not handled.

502 - Gateway error. A common situation is that the reverse proxy backend server (such as resin or tomcat) is not started.

503 - Service unavailable. For example, the server load is too high or the server has stopped serving.

504 - Gateway timeout. For example, the request duration exceeds the server's response time limit.

 Headers

HTTP headers are divided into two categories: request header (Request Header) and response header (Response Header). The following are some headers we often use.

 1. Cache control

In Internet website applications, caches are almost everywhere. In http-based services, we can also control Some content that does not change frequently is cached on the client side, so that the cached content can be reused in multiple visits, speeding up access, and improving user experience. The http protocol stipulates some http message headers for cache control:

Cache-Control(HTTP/1.1)/Pragma(HTTP/1.0): Indicates whether the client caches and how long the cache time is long. The default value is private, which means the content is cached in the user's private space. For example: Cache-Control: max-age=86400, must-revalidate, this tells the client that the requested resource is cached for one day (max-age unit is seconds, relative time), and must be re-checked after expiration.

Expires: Specify how long the client (if no forced refresh is required) can directly read the local cache without sending a request to the server.

Note:

Priority: Cache-Control > Expires;

Detailed parameter description: http://condor.depaul.edu/dmumaugh/readings/handouts/ SE435/HTTP/node24.html

The different behaviors of different browsers (refresh, back, enter in the address bar, etc.) may have differences in implementation;

Last-Modified/If-Modified -Since: Last-Modified is the last modified timestamp of the resource returned by the server to the client. In this way, the client will bring the If-Modified-Since parameter to verify whether the resource has been updated during the next request (such as forced refresh). No If updated, the server will return a 304 status code, and the client will directly access the locally cached resources. At this time, there is only request overhead and no network transmission overhead. Note: The timestamp must be Greenwich Mean Time (GMT), for example: Last-Modified:Sat, 19 Oct 2013 09:20:15 GMT

ETag/If-None-Match: ETag is based on file attributes The resource identifier generated through a certain algorithm is also used to determine whether the resource requested by the client has been updated. If the server returns an ETag value to the client, the next time the client requests it, it will bring the If-None-Match parameter to verify whether the resource is updated. If it is not updated, a 304 status code will be returned. (The effect is basically the same as Last-Modified)

Note:

ETag needs to be calculated, which is a consumption for servers with tight computing resources, so some websites do not use ETag directly;

If the server is behind a load balancer, requests for the same resource may be distributed to different backend machines. Since the calculation of ETag depends on file attributes, files with the same content on different machines may generate different ETags, which may Failed to pass ETag verification for files whose original content has not changed. There are two solutions here: one is that etag calculation does not depend on the local machine, such as directly calculating the md5 value of the file content; the other is to distribute the same URL request to the same back-end machine on the load balancer.

In our actual business scenarios, http caching has great uses. Here are some:

Make full use of the client’s resources, such as some static files that the client needs to access frequently. Such as LOGO, advertising images, etc., can be cached locally on the client. This can reduce network requests, speed up client display, and reduce the pressure on server requests.

When some of our static content, such as news, blogs, etc., are crawled by search engine crawlers, by controlling the cache parameters, we can reduce the crawler's crawling frequency and reduce unnecessary waste of resources.

If our static resources use CDN, then setting up http cache can save a file on the CDN node, reducing the number of CDN returns to the origin, reducing network delay and origin server pressure.

 2. Breakpoint request

Accept-Ranges: When the server supports breakpoint download, it will return this response header to the client. When the client knows this, it can send a breakpoint request. .

Content-Length: The length of the response information, telling the client how much data is returned by the current request. It should be noted here that when submitting a request using the head method, no specific data will be returned, but the Content-Length will return the size of the complete data.

Range/Content-Range: The client submits a header named Range when requesting, telling the server which part of the data it wants to request. For example: Range: bytes=0-1023 means requesting bytes 0 to 1023. Then the server returns the content of these 1024 bytes to the client, and Content-Range will be included in the response header. That is: Content-Range: bytes 0-1023/4096, this 4096 is the total file size. The client's next request can start from the 1024th byte, Range: bytes=1024-xxxx

 3. Encoding

Accept-Encoding/Content-Encoding: The former is supported by the client Received message encoding type. The default is identity, optional values ​​include gzip, compress, etc. The latter is the content encoding type of the server-side response information, and compression is commonly used. The benefits of compression are obvious. It can greatly reduce the cost of network transmission. Compared with the CPU consumption caused by server-side compression, the reduction of network transmission is obviously more practical. Common forms: Content-Encoding: gzip, deflate, compress. Usually we can compress and transmit response results such as html, js, css, xml, and json.

Transfer-Encoding: response header. The transfer encoding type of the response message specifies the form of network transmission. Generally, it is in the following form: Transfer-Encoding: chunked. When the server generates dynamic content and does not know the specific length of the response information, it can transmit it in designated chunks and return as much data as it processes, so there is no need to wait until the data is ready and return it all at once. Combined with the above content encoding, such as gzip, it can be compressed in blocks and transmitted. In addition, please note that when using this encoding to transmit, we cannot see the Content-Length because the content has not been fully generated.

 4. Others

X-Forward-For: request header. Used to identify the user’s real IP, especially when accessing the server through a proxy (forward or reverse) or when the server is under load Equalize the situation behind the device. Format: X-forward-For: client, proxy1, proxy2,... The leftmost one is the IP closest to the client.

User-Agent: request header. The request header used by the server to identify the client's basic information. Generally, this is useful when identifying search crawlers. In some scenarios, this can also be used to do some client statistics.

Referer: request header. When the client accesses the server, this Referer specifies the source of the request, such as which website it is linked from. We often use this in some statistics. In addition, another important use is to filter illegal request sources in scenarios that require resource anti-hotlinking (however, this referer can be forged by the client).

Location: response header. This Location header will be included in the response header of the 301/302 status code to instruct the client to use the new address to access the required resources.

Connection: request/response header. In http/1.1, the client and server keep the connection by default, that is, Connection: keep-alive. If either party does not want to keep the connection, you can put this The value is set to close. By default, the client and server will maintain a long connection, so that the client can use this connection to send multiple http requests, reducing the consumption caused by frequent connection creation. For this parameter, more settings may be required on the server side, such as the connection keep-alive time and some network parameter settings of the server kernel (for tcp).

Session and Cookie

HTTP requests are stateless requests, but in our Internet applications, it is often necessary to identify user status information to complete some interactive operations. For example, user authentication needs to record user login status, and shopping cart applications need to remember user selections. Products, advertising applications need to record users’ historical browsing behavior, etc. Session and cookies will be used here.

session: refers to the interaction state between the client and the server during the http request-response process. This information is stored on the server side, such as memory, database, etc. Each session has a unique identifier, which is generated by the server. This identifier must also be saved on the client, so that the client can bring this identifier with the next request to facilitate the server to determine the client's status.

Client support for session:

Save the session id through cookie and send it to the server when requesting.

Communicate with the server by carrying the session id in the url parameters.

Communicate with the server by carrying the session id in the hidden field of the form.

Session sharing problem:

In distributed applications, our http server is usually installed behind a reverse proxy or load balancing device, which will face a session sharing problem. . That is to say, multiple requests from the same user may be distributed to multiple different machines. If we save the session in the local memory of the machine, we cannot share the user's session among multiple machines. Generally speaking, we can solve this problem in two ways:

Store the session in distributed memory (eg: memcached) or centralized storage (eg: database).

Distribute the requests of the same user to the same machine on the reverse proxy or load balancing device (here we need to deal with the problem of request redistribution after the machine goes down).

Cookie: Maintain stateful information on the client. Each cookie content belongs to a specific domain (domain) and path (path). For security reasons, cookies in different domains or paths cannot be shared.

Session cookie: No expiration time is specified, it is stored in memory and will expire after the browser is closed.

Persistent cookie: Specifies the expiration time and is saved locally in the browser.

For details, please refer to: http://en.wikipedia.org/wiki/HTTP_cookie

It should be noted that cookies will have some security issues.

Here I just summarized my understanding of some content related to the http protocol that I encountered at work. There are still many things that need to be explored in the http protocol, and we also need to continue to explore and understand the http protocol. It will bring great convenience to our development applications.

Finally, I recommend two very NB http debugging tools: fiddler (windows) and charles (mac) have http proxy function. For http applications that are not browser-based (such as mobile app), you can use these two A tool to monitor http requests.

The above is the detailed content of Introduction and in-depth understanding of HTTP protocol. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn