Home  >  Article  >  Web Front-end  >  Detailed introduction to HTTP caching

Detailed introduction to HTTP caching

不言
不言forward
2018-10-29 14:45:082473browse

This article brings you a detailed introduction to HTTP caching. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

Getting content over the Internet is slow and expensive. Larger responses require multiple round-trips between the client and the server, which delays the time it takes for the browser to get and process the content and increases your visitor's traffic charges. Therefore, the ability to cache and reuse previously acquired resources becomes a key aspect of performance optimization.

Fortunately, every browser comes with HTTP cache implementation function. You just need to make sure that each server response provides the correct HTTP header instructions to indicate to the browser when the response can be cached and for how long.

Note: If you use Webviews in your application to fetch and display web page content, you may need to provide additional configuration flags to ensure that HTTP caching is enabled, sized appropriately for the use case, and cached Will be persisted. Be sure to check the platform documentation and confirm your settings!

Detailed introduction to HTTP caching

When the server returns a response, a set of HTTP headers are also emitted to describe the response content type, length, cache Instructions, verification tokens, etc. For example, in the interaction above, the server returns a A 1024-byte response, instructs the client to cache it for up to 120 seconds, and provides a validation token ("x234dff") that can be used to check whether the resource has been modified after the response expires.

Verify cached response via ETag

TL;DR

The server uses the ETag HTTP header to pass the verification token.

Authentication tokens enable efficient resource update checking: no data is transferred when the resource has not changed.

Assume that resource 120 is obtained for the first time Seconds later, the browser initiated a new request for the resource. First, the browser checks the local cache and finds the previous response. Unfortunately, the response has now expired and is unusable by the browser. At this point, the browser can directly make a new request and get a new complete response. However, this is less efficient because if the resource hasn't changed, there's no point in downloading the exact same information that's already in the cache!

This is exactly the problem that the validation token (specified in the ETag header) is designed to solve. The random token generated and returned by the server is usually a hash of the file's contents or some other fingerprint. The client doesn't need to know how the fingerprint was generated, it just sends it to the server on the next request. If the fingerprint is still the same, the resource has not changed and you can skip the download.

Detailed introduction to HTTP caching

In the above example, the client automatically provides the ETag within the "If-None-Match" HTTP request header Token. The server checks the token against the current resource. If it has not changed, the server will return "304 Not Modified" response, informing the browser that the response in the cache has not changed and can be delayed for another 120 seconds. Please note that you do not have to download the response again, which saves time and bandwidth.

As a web developer, you How to take advantage of efficient re-authentication? The browser does all the work for us: it automatically detects whether an authentication token was previously specified, it appends the authentication token to the request made, and it responds based on the response it receives from the server. Update the cache timestamp if necessary. The only thing we have to do is ensure that the server provides the necessary ETag token. Check your server documentation for the necessary configuration flags.

Note: Tip: HTML5 The Boilerplate project contains sample configuration files for all the most popular servers, with detailed annotations for each configuration flag and setting. Find your favorite server in the list, find the appropriate settings, and copy/confirm your server configuration recommended settings.

Cache-Control

TL;DR

Each resource can be accessed via the Cache-Control HTTP header Define its caching strategy

The Cache-Control directive controls who can cache responses under what conditions and for how long.

From a performance optimization perspective, the best request is one that requires no communication with the server of requests: You can eliminate all network latency with a local copy of the response, as well as avoid the traffic charges of data transfer. To achieve this, the HTTP specification allows servers to return Cache-Control directives that control how browsers and other intermediate caches cache individual Responses and how long to cache them.

Note: The Cache-Control header was defined in the HTTP/1.1 specification, replacing headers previously used to define response caching policies (such as Expires). All modern Browsers all support Cache-Control, so using it is enough.

Detailed introduction to HTTP caching

"no-cache" and "no-store ”

"no-cache" means that you must confirm with the server whether the returned response has changed before you can use the response to satisfy subsequent requests for the same URL. Therefore, if a suitable validation token (ETag) is present, no-cache initiates a round-trip communication to validate the cached response, but avoids downloading if the resource has not changed.

In contrast, "no-store" is much simpler. It directly prevents the browser and all intermediate caches from storing any version of the returned response, for example, a response that contains personal privacy data or banking data. Each time a user requests the asset, a request is sent to the server and the full response is downloaded.

"public" vs. "private"

If the response is marked as "public", the response can be cached even if it has an associated HTTP authentication, even if the response status code normally cannot be cached . In most cases, "public" is not required because explicit caching information (such as "max-age") already indicates that the response is cacheable.

In contrast, browsers can cache "private" responses. However, these responses are typically only cached for a single user, so no intermediate cache is allowed to cache them. For example, a user's browser can cache an HTML page that contains the user's private information, but a CDN cannot.

"max-age"

The directive specifies the maximum time (in seconds) that a fetched response is allowed to be reused starting from the time of the request. For example, "max-age=60" means that the response can be cached and reused for the next 60 seconds.

Define the optimal Cache-Control strategy

Detailed introduction to HTTP caching

##Follow the above decision tree Determine the best caching strategy for a specific resource or set of resources used by your app. Ideally, your goal should be to cache as many responses as possible on the client, for as long as possible, and provide an authentication token with each response to enable efficient revalidation.

Detailed introduction to HTTP caching

According to the HTTP Archive, among the top 300,000 websites (as ranked by Alexa), almost half of all downloaded responses can be retrieved by the browser Caching, which can significantly reduce duplicate page views and visits. Of course, this doesn't mean that 50% of your particular app's resources can be cached. Some sites may have more than 90% of their resources cacheable, while other sites may have much private or time-sensitive data that cannot be cached at all.

Please audit your pages to determine which resources are cacheable and make sure they return the correct Cache-Control and ETag headers.

Discarding and updating cached responses

TL;DR

The locally cached response will be used until the resource "expires".

You can force the client to update to a new version of the response by embedding a file content fingerprint in the URL.

For best performance, each application needs to define its own cache hierarchy.

All HTTP requests made by the browser are first routed to the browser cache to confirm that a valid response is cached that can be used to satisfy the request. If there is a matching response, the response is read from the cache, thus avoiding network latency and the traffic costs of delivery.

But what if you want to update or discard the cached response? For example, suppose you have told your visitors to cache a CSS style sheet for 24 hours (max-age=86400), but the designer has just submitted an update that you want all users to be able to use. How do you notify someone who has CSS that is now "obsolete" Do all visitors to the cached copy update their cache? You can't do that without changing the resource URL.

After the browser caches the response, the cached version will be used until it expires (determined by max-age or expires decision), or until removed from the cache for some other reason, such as the user clearing the browser cache. Therefore, when building a web page, different users may end up using different versions of the file; the user who just fetched the resource will use the new version of the response, while the user who cached an earlier (but still valid) copy will use the older version of the response. .

So, how can you have the best of both worlds: client-side caching and fast updates? You can change the URL of a resource when its content changes, forcing users to download a new response. Typically this is accomplished by embedding the file's fingerprint or version number in the file name - for example style.x234dff.css.

Detailed introduction to HTTP caching

Because you can define caching policies for each resource, you can define a "cache hierarchy" that not only controls the caching of each response Time, you can also control how quickly your visitors see new versions. To illustrate, let’s analyze the above example together:

HTML is marked as "no-cache", which means that the browser will always re-validate the document on every request and get the latest version when the content changes. Additionally, in HTML tag, you embed fingerprints in the URLs of your CSS and JavaScript assets: if the content of these files changes, the HTML of the page will change as well, and will be downloaded A new copy of the HTML response.

Allow browsers and intermediate caches (such as CDNs) to cache CSS and set CSS to expire after 1 year. Note that you can safely use 1-year "forward expiration" because you embed the file's fingerprint in the filename: the URL will change when the CSS is updated.

JavaScript is also set to expire after 1 year, but is marked private, perhaps because it contains some private user data that CDNs should not cache.

Images are cached without versions or unique fingerprints and are set to expire after 1 day.

You can use a combination of ETags, Cache-Control, and unique URLs to achieve the best of both worlds: longer expiration times, control over where responses can be cached, and on-demand updates.

Caching Checklist

There is no optimal caching strategy. You need to define and configure appropriate settings for each resource, as well as the overall "cache hierarchy" based on communication patterns, types of data provided, and application-specific data update requirements.

When developing a caching strategy, you need to keep these tips and methods in mind:

Use consistent URLs: If you serve the same content on different URLs, it will be fetched multiple times and Store these contents. Tip: Please note that URLs are case-sensitive.

Ensure the server provides an authentication token (ETag): With an authentication token, there is no need to transfer the same bytes when the resource on the server has not changed.

Determine which resources can be cached by intermediate caches: Resources that respond exactly the same to all users are ideally suited to be cached by CDNs and other intermediate caches.

Determine the optimal cache period for each resource: different resources may have different update requirements. Review and determine the appropriate max-age for each resource.

Determine the cache hierarchy that works best for your site: You can control how quickly clients get updates by using a combination of resource URLs that contain content fingerprints and short or no-cache periods for HTML documents.

Minimize churn: Some resources are updated more frequently than others. If a specific part of the resource (such as a JavaScript function or CSS Style sets) are updated frequently, consider providing their code as a separate file. This way, every time an update is fetched, the rest of the content (such as content library code that changes less frequently) can be fetched from the cache, minimizing the size of the downloaded content.

The above is the detailed content of Detailed introduction to HTTP caching. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:developers.google. If there is any infringement, please contact admin@php.cn delete