nginx cache system design principle-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

nginx cache system design principle

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 08, 2016 am 09:20 AM

cachenbspnginxproxystore

Here we use the cache system of nginx as a clue to discuss the design and related details of a cache server. I will try my best to analyze it from the perspective of design and framework. Due to space limitations, I will not go into the code here. Regarding the relevant details, everyone is welcome to join us. Participate in discussions.

After a cache server obtains a file from the backend, it is either sent directly to the client (scientific name is transparent transmission), or cached locally. When subsequent identical requests access the cache server, the local copy can be used directly. Yes, if it can be used. If a locally cached file is accessed by a subsequent request, it is called a hit in the cache. If there is no cache copy of the file locally, the cache server needs to go to the backend to obtain the file according to the configuration or resolve the domain name. This is called a cache miss, that is, a miss. For more knowledge about the cache server, we will discuss it in depth when analyzing the nginx cache system.

The storage system of nginx is divided into two categories. One is opened through proxy_store. The storage method is to store it locally according to the file path in the URL. For example, /file/2013/0001/en/test.html, then nginx will create each directory and file in the specified storage directory in sequence. The other type is opened through proxy_cache. The files stored in this way are not organized according to the URL path, but are managed using some special methods (called custom methods here). The custom methods are what we will focus on analysis. So what are the advantages of each of these two methods?

The method of storing files by URL path is relatively simple for the program to process, but the performance is not good. First of all, some URLs are very long. If we have to create such a deep directory on the local file system, opening and searching of files will be very slow (recall the process of searching for inodes through path names in the kernel). If you use a custom way to handle the pattern, although it is inseparable from files and paths, it will not increase complexity and reduce performance due to URL length. In a sense, this is a user-mode file system, and the most typical one is CFS in Squid. The method used by nginx is relatively simple, mainly relying on the md5 value of the URL for management, which we will analyze later.

Caching is inseparable from fetching content from the backend and then sending it to the client. It is easy for everyone to think of the specific processing method, which must be receiving and sending at the same time. Other methods are too inefficient, such as reading and then sending, etc. Let me mention here that nginx is receiving and sending at the same time. The structure used is ngx_event_pipe_t, which is the medium for communicating between the backend and the client. Since the structure is a general component, some special tags are needed to handle related functions involving storage, so the member cacheable takes on this important task.

p->cacheable = u->cacheable || u->store;

That is, if cacheable is 1, it needs to be stored, otherwise it will not be stored. So what do u->cacheable and u->store stand for? They respectively represent the two methods mentioned earlier, namely proxy_cache and proxy_store.

(To add some knowledge, when nginx fetches back-end data, its behavior is controlled by proxy_buffering, which is to enable response buffering for the back-end server. If buffering is enabled, nginx assumes that the proxy server can deliver the response very quickly, and will It is put into a buffer, and the relevant parameters can be set using proxy_buffer_size and proxy_buffers. If the response cannot fit into memory, it is written to the hard disk. If buffering is disabled, the response from the backend will be delivered to the client immediately.)

These are all side projects. We haven’t touched the core of nginx cache function yet. From an implementation point of view, there is a member called cache in the nginx upstream structure, and its type is ngx_shm_zone_t. If we enable the cache function, the cache member is used to manage shared memory (why is shared memory used?), and the member is NULL when stored in other ways. Another point that needs to be explained is that in the cache system, a file is usually called a store object, that is, a cache object, so you must create a store object before caching. An important question is how to choose the time to create it. What do you think about this? First we need to check whether a file needs to be cached. Obviously files requested by the GET method generally need to be cached, so in the early stage of request processing, if we see the GET method, we can create an object first. But many times, even a file requested by a GET method cannot be cached. If you create the object prematurely, you will not only waste time and space, but also destroy it in the end. So what affects the storage of GET requests? That is the Cache-control field in the response header. This field tells the proxy or browser whether the file can be cached. Generally, cache servers will cache requests by default when there is no Cache-control field in the response header.

Based on this consideration, the cache server we developed will only create cache objects after the response header is parsed and sufficient evidence of cacheability is obtained. Unfortunately, nginx does not do this.

nginx completes the creation of cache objects in the ngx_http_upstream_init_request function. At what stage of http processing is this function located? before establishing a connection with the backend. I personally think this place is not suitable. . . What do you think?

Regarding the creation process, you can read the function ngx_http_upstream_cache. Here I will analyze our cache by comparing it with nginx. Our request uses a member named store to establish contact with the cache object. The same goes for nginx, which has a cache member in its request structure to do the same thing. The difference is that the space corresponding to our store members is in shared memory, while nginx applies for it in r->pool (why do we do this?).

In the next step, nginx needs to generate the key of the cache object according to the configuration, which is generally calculated using md5. This key serves as the unique identifier of a cache object in the system. Many people may be worried about md5 collisions. I think this requirement is completely acceptable here if it is not particularly demanding, and the processing is relatively simple.

The next thing to deal with is, how should the files be stored on the disk?

Let’s take an example we used before: /file/2013/0001/en/test.html. Its corresponding md5 value is 8ef9229f02c5672c747dc7a324d658d0. In fact, nginx uses it as the file name. that's it? What happens if we find a directory to store files and there are a bunch of such files in it? We know that most file systems have restrictions on the number of files in a single directory, so such simple and crude processing is not possible. What to do? nginx allows you to use multi-level directories through configuration to solve this problem. To put it simply, nginx uses the levels directive to specify the number of directory levels (separated by colons) and the number of characters in each directory name. In our example, assume that the configuration levels=1:2 means that a two-level directory is used. The first-level directory name is one character, and the second-level directory name uses two characters. However, nginx supports up to 3 levels of directories, that is, levels=xxx:xxx:xxx.

So where do the characters that make up the directory name come from? Assume that our storage directory is /cache, levels=1:2, then the above file is stored like this:

/cache/0/8d/8ef9229f02c5672c747dc7a324d658d0

You see how the two directory names 0 and 8d came from, no need to explain.

After the object is created, you need to cache the object management structure, which is handled by ngx_http_file_cache_exists.

If the current directory and files already exist when creating this file, what should I do? You can go through the code and see how nginx handles it.

The discussion has come to an end. In fact, it is all preparatory work now. Next time we will discuss the processing of the arrival of back-end content.

Extended reading:

http://www.pagefault.info/?p=123

http://www.pagefault.info/?p=375

The above has introduced the design principles of nginx's cache system, including aspects of it. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the concept of session locking.Apr 29, 2025 am 12:39 AM

Sessionlockingisatechniqueusedtoensureauser'ssessionremainsexclusivetooneuseratatime.Itiscrucialforpreventingdatacorruptionandsecuritybreachesinmulti-userapplications.Sessionlockingisimplementedusingserver-sidelockingmechanisms,suchasReentrantLockinJ

Are there any alternatives to PHP sessions?Apr 29, 2025 am 12:36 AM

Alternatives to PHP sessions include Cookies, Token-based Authentication, Database-based Sessions, and Redis/Memcached. 1.Cookies manage sessions by storing data on the client, which is simple but low in security. 2.Token-based Authentication uses tokens to verify users, which is highly secure but requires additional logic. 3.Database-basedSessions stores data in the database, which has good scalability but may affect performance. 4. Redis/Memcached uses distributed cache to improve performance and scalability, but requires additional matching

What is the full form of PHP?Apr 28, 2025 pm 04:58 PM

The article discusses PHP, detailing its full form, main uses in web development, comparison with Python and Java, and its ease of learning for beginners.

How does PHP handle form data?Apr 28, 2025 pm 04:57 PM

PHP handles form data using $\_POST and $\_GET superglobals, with security ensured through validation, sanitization, and secure database interactions.

What is the difference between PHP and ASP.NET?Apr 28, 2025 pm 04:56 PM

The article compares PHP and ASP.NET, focusing on their suitability for large-scale web applications, performance differences, and security features. Both are viable for large projects, but PHP is open-source and platform-independent, while ASP.NET,

Is PHP a case-sensitive language?Apr 28, 2025 pm 04:55 PM

PHP's case sensitivity varies: functions are insensitive, while variables and classes are sensitive. Best practices include consistent naming and using case-insensitive functions for comparisons.

How do you redirect a page in PHP?Apr 28, 2025 pm 04:54 PM

The article discusses various methods for page redirection in PHP, focusing on the header() function and addressing common issues like "headers already sent" errors.

Explain type hinting in PHPApr 28, 2025 pm 04:52 PM

Article discusses type hinting in PHP, a feature for specifying expected data types in functions. Main issue is improving code quality and readability through type enforcement.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks agoByDDD

Roblox: Dead Rails – How To Summon And Defeat Nikola Tesla

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SublimeText3 Chinese version

Chinese version, very easy to use

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

Where is the login entrance for gmail email?

7801

1644

1402

1299

1236