Home > Article > Backend Development > Talk about CDN caching
1. What is CDN?
Talking about the role of CDN, we can use 8 years of experience in buying train tickets to vividly describe it:
8 years ago, there was no such thing as a train ticket sales point, especially 12306.cn. No way to start. At that time, train tickets could only be purchased at the ticket hall of the train station, and the small county where I lived did not have access to trains. Train tickets had to be purchased at the city's train station, and it took 4 hours to go back and forth from the county to the city. The drive is a waste of life. It got better later. Train ticket sales points appeared in small counties. You can buy trains directly at the sales points, which is a lot more convenient. People in the city no longer have to queue up at one point to buy tickets.
CDN can be understood as the train ticket sales points distributed in each county. When the user browses the website, the CDN will select a CDN edge node closest to the user to respond to the user's request, so that Hainan Mobile The user's request will not go all the way to the server in Beijing Telecom's computer room (assuming that the origin site is deployed in Beijing Telecom's computer room).
The advantages of CDN are obvious: (1) CDN nodes solve the problem of cross-operator and cross-regional access, and the access delay is greatly reduced; (2) Most requests are completed at CDN edge nodes, and CDN plays a role The offloading function reduces the load on the origin site.
2. What is cache?
This article does not delve into the high-level architecture behind CDN, nor does it discuss how CDN achieves global traffic scheduling strategies. This article focuses on how data is cached after CDN is installed. Caching is a ubiquitous example of trading space for time. By using the extra space, we can get faster speeds.
First, let’s take a look at how the user’s browser interacts with the server when there is no website connected to the CDN:
When the user browses the website, the browser can save the content of the website locally. Copies of images or other files, so that when the user visits the website again, the browser does not have to download all the files. Reducing the download volume means increasing the page loading speed.
If a layer of CDN is added in the middle, the interaction between the user's browser and the server is as follows:
The client browser first checks whether the local cache has expired. If it expires, it goes to the edge of the CDN The node initiates a request, and the CDN edge node will detect whether the cache of the user's requested data has expired. If it has not expired, it will directly respond to the user's request. At this time, a completed http request ends; if the data has expired, then the CDN also needs to send a return message to the origin site. Request (back to the source request) to pull the latest data. The typical topology diagram of CDN is as follows:
It can be seen that in the scenario where CDN exists, the data has gone through two stages: client (browser) caching and CDN edge node caching. The following are the two stages: Detailed analysis of the two stages of caching
2. Client (browser) caching
Disadvantages of client caching
Client-side caching reduces server requests, avoids repeated loading of files, and significantly improves user performance. However, when the website is updated (such as css, js, and image files are replaced), the old version of the file is still saved locally in the browser, leading to unpredictable consequences.
Once upon a time, when a page was loaded, the positions of various elements on the page were randomly moved, and button clicks failed. The front-end GG would habitually ask: "Has the cache been cleared?", then Ctrl+F5, Everything is OK. But sometimes, if we simply hit Enter in the browser address bar, or just press F5 to refresh, the problem is still not solved. Do you know that these three different operation methods determine how the browser refreshes the cache? Strategy?
How does the browser determine whether to use a local file or a new file on the server? Here are several methods of judgment.
Browser Cache Policy
Expires
Expires:Sat, 24 Jan 2015 20:30:54 GMT
If Expires is set in the http response message, we will avoid the connection to the server before Expires expires. At this time, the browser does not need to send a request to the browser. It only needs to determine whether the material in hand has expired. There is no need to increase the burden on the server at all.
Cache-control: max-age
The Expires method is very good, but we have to calculate a precise time every time. The max-age tag makes it easier for us to handle expiration times. Suffice it to say, you will only be able to use this information for a week.
Max-age is measured in seconds, such as:
Cache-Control:max-age=645672
The specified page will expire in 645672 seconds (7.47 days).
Last-Modified
In order to notify the browser of the current file version, the server will send a tag with the last modification time, for example:
Last-Modified:Tue, 06 Jan 2015 08 :26:32 GMT
In this way, the browser will know the creation time of the file it received. In subsequent requests, the browser will verify according to the following rules:
1. Browse Host: Hey, I need the jquery.min.js file. If it was modified after Tue, 06 Jan 2015 08:26:32 GMT, please send it to me.
2. Server: (Check the modification time of the file)
3. Server: Hey, this file has not been modified after that time, and you already have the latest version.
4. Browser: Great, then I will display it to the user.
In this case, the server only returns a 304 response header, which reduces the amount of response data and improves the response speed. Regarding the 304 response, please refer to:
http://www.cnblogs.com/ziyunfei/archive/2012/11/17/2772729.html
The picture below shows the page returning the 304 response header after pressing F5 to refresh the page. .
ETag
Normally, comparing files by modification time is feasible. However, in some special circumstances, such as the server clock being wrong, the server clock being modified, or the server time not being updated in time after the arrival of daylight saving time DST, these will cause the problem of comparing file versions through the modified time.
ETag can be used to solve this problem. ETag is a unique identifier of a file. Like a hash or fingerprint, each file has an individual signature that changes whenever the file changes.
The server returns the ETag tag:
ETag:”39001d-1762a-50bf790757e00”
The next access sequence is as follows:
- Browser: Hey, I need the file jquery.min.js. Is there anything that does not match the string "39001d-1762a-50bf790757e00"
- Server: (check ETag...)
- Server: Hey, the version I have here is also "39001d-1762a-50bf790757e00", you are already the latest version
- Browser: OK, then you can use local cache
Like Last-modified, ETag solves the problem of file version comparison. It's just that the level of ETag is higher than Last-Modified.
Extra Tags
Caching tags never stop working, but sometimes we need some control over what has been cached.
l Cache-control: public indicates that the cached version can be recognized by proxy servers or other intermediate servers.
l Cache-control: private means that this file is different for different users. Only the user's own browser can cache, public proxy servers do not allow caching.
l Cache-control: no-cache means that the contents of the file should not be cached. This is very useful in search or page-turning results, because the corresponding content will change for the same URL.
- Browser cache refresh
Enter the URL in the address bar and press Enter or click the Go button
The browser obtains the data of the web page with the minimum number of requests, and the browser will Use local cache directly for all content that has not expired, thus reducing requests to the browser. Therefore, Expires and max-age tags are only valid for this method.
Press F5 or the browser refresh button
The browser will append the necessary cache negotiation to the request, but does not allow the browser to use the local cache directly. It can make Last-Modified , ETag works, but it has no effect on Expires.
Press Ctrl+F5 or press Ctrl and click the refresh button
This method is to force refresh and always initiate a new request without using any cache.
CDN cache
After the browser’s local cache expires, the browser will initiate a request to the CDN edge node. Similar to browser caching, CDN edge nodes also have a caching mechanism.
Disadvantages of CDN caching
The offloading effect of CDN not only reduces user access delays, but also reduces the load on the origin site. But its shortcomings are also obvious: when the website is updated, if the data on the CDN node is not updated in time, even if the user uses Ctrl + F5 in the browser to invalidate the browser cache, the CDN edge node will not synchronize the latest data. Causes user access exceptions.
CDN Cache Strategy
CDN edge node caching strategies vary from service provider to service provider, but generally follow the http standard protocol and set CDN edge node data through the Cache-control: max-age field in the http response header. Cache time.
When the client requests data from the CDN node, the CDN node will determine whether the cached data has expired. If the cached data has not expired, the cached data will be returned directly to the client; otherwise, the CDN node will send the cached data to the source. The site issues a return-to-origin request, pulls the latest data from the origin site, updates the local cache, and returns the latest data to the client.
CDN service providers generally provide multiple dimensions based on file suffix and directory to specify the CDN cache time to provide users with more refined cache management.
CDN cache time will have a direct impact on the "return to origin rate". If the CDN cache time is short, the data on the CDN edge node will often fail, resulting in frequent returns to the origin, increasing the load on the origin site, and also increasing the access delay; if the CDN cache time is too long, data updates will occur. The problem of slow time. Developers need to add specific businesses to perform specific data cache time management.
CDN cache refresh
CDN edge nodes are transparent to developers, compared to the forced refresh of browser Ctrl+F5. If the browser's local cache is invalid, developers can clear the CDN edge node cache through the "refresh cache" interface provided by the CDN service provider. In this way, developers can use the "refresh cache" function to force the data cache on the CDN node to expire after updating the data, ensuring that the client can pull the latest data when accessing.
The above is the detailed content of Talk about CDN caching. For more information, please follow other related articles on the PHP Chinese website!