search
HomeOperation and MaintenanceLinux Operation and MaintenanceEstablishment and optimization of caching mechanism for website system

After talking about the external network environment of the Web system, now we start to pay attention to the performance issues of our Web system itself.

As the number of visits to our website increases, we will encounter many challenges. Solving these problems is not just as simple as expanding the machine, but establishing and using an appropriate caching mechanism is fundamental.

In the beginning, our Web system architecture may be like this. Each link may have only one machine.

Establishment and optimization of caching mechanism for website system

1. The internal cache of MySQL database uses

MySQL’s caching mechanism. Let’s start from the inside of MySQL. The following content It will be based on the most common InnoDB storage engine.

1. Create an appropriate index

The simplest is to create an index. When the table data is relatively large, the index plays a role in quickly retrieving data, but the cost There are also some. First of all, it occupies a certain amount of disk space. Among them, the combined index is the most prominent. It needs to be used with caution. The index it generates may even be larger than the source data. Secondly, operations such as data insert/update/delete after index creation will take more time because the original index needs to be updated. Of course, in fact, our system as a whole is dominated by select query operations. Therefore, the use of indexes can still significantly improve system performance.

2. Database connection thread pool cache

If every database operation request needs to create and destroy a connection, it will undoubtedly be a huge overhead for the database. In order to reduce this type of overhead, thread_cache_size can be configured in MySQL to indicate how many threads are reserved for reuse. When there are not enough threads, they are created again, and when there are too many idle threads, they are destroyed.

In fact, there is a more radical approach, using pconnect (database long connection), once the thread is created, it will be maintained for a long time. However, when the amount of access is relatively large and there are many machines, this usage is likely to lead to "the number of database connections is exhausted", because the connections are not recycled, and eventually the max_connections (maximum number of connections) of the database are reached. Therefore, the usage of long connections usually requires the implementation of a "connection pool" service between CGI and MySQL to control the number of connections created "blindly" by the CGI machine.

3. Innodb cache settings (innodb_buffer_pool_size)

innodb_buffer_pool_size This is a memory cache area used to save indexes and data. If the machine is exclusive to MySQL, it is generally recommended to be 80 of the machine's physical memory. %. In the scenario of fetching table data, it can reduce disk IO. Generally speaking, the larger this value is set, the higher the cache hit rate will be.

4. Sub-library/table/partition.

MySQL database tables generally withstand data volume in the millions. If it increases further, the performance will drop significantly. Therefore, when we foresee that the data volume will exceed this level, it is recommended to Operations such as sub-database/table/partition. The best approach is to design the service into a sub-database and sub-table storage model from the beginning, to fundamentally eliminate risks in the middle and later stages. However, some conveniences, such as list-based queries, will be sacrificed, and at the same time, maintenance complexity will be increased. However, when the amount of data reaches tens of millions or more, we will find that they are all worth it.

2. Set up multiple MySQL database services

One MySQL machine is actually a high-risk single point, because if it hangs up, our web service will No longer available. Moreover, as the number of visits to the Web system continued to increase, one day, we found that one MySQL server could not support it, and we began to need to use more MySQL machines. When multiple MySQL machines are introduced, many new problems will arise.

1. Establish MySQL master-slave, with the slave database as a backup.

This approach is purely to solve the problem of "single point of failure". When the master database fails, switch to the slave database. However, this approach is actually a bit of a waste of resources, because the slave library is actually idle.

Establishment and optimization of caching mechanism for website system

#2. MySQL separates reading and writing, writing to the main database and reading from the slave database.

The two databases separate reading and writing. The main database is responsible for writing classes, and the slave database is responsible for reading operations. Moreover, if the main database fails, the reading operation will not be affected. At the same time, all reading and writing can be temporarily switched to the slave database (you need to pay attention to the traffic, because the traffic may be too large and the slave database will be brought down).

Establishment and optimization of caching mechanism for website system

#3. Primary and secondary backup.

The two MySQL servers are each other's slave database and the master database at the same time. This solution not only diverts traffic pressure, but also solves the problem of "single point of failure". If any unit fails, there is another set of services available.

However, this solution can only be used in the scenario of two machines. If the business is still expanding rapidly, you can choose to separate the business and establish multiple master-master and mutual-backup services.

Establishment and optimization of caching mechanism for website system

3. Establish a cache between the Web server and the database

In fact, to solve the problem of large visits, we cannot just focus on the database level. According to the "80/20 rule", 80% of requests only focus on 20% of hot data. Therefore, we should establish a caching mechanism between the web server and the database. This mechanism can use disk as cache or memory cache. Through them, most hot data queries are blocked in front of the database.

1. Page staticization

When a user visits a certain page on the website, most of the content on the page may not change for a long time. For example, a news report will almost never be modified once it is published. In this case, the static html page generated by CGI is cached locally on the disk of the web server. Except for the first time, which is obtained through dynamic CGI query database, the local disk file is returned directly to the user.

When the scale of the Web system was relatively small, this approach seemed perfect. However, once the scale of the Web system becomes larger, for example, when I have 100 Web servers. In this way, there will be 100 copies of these disk files, which is a waste of resources and difficult to maintain. At this time, some people may think that they can centralize a server to store it. Haha, why not take a look at the following caching method, which is how it does it.

2. Single memory cache

Through the example of page staticization, we can know that it is difficult to maintain the "cache" on the Web machine itself, and it will bring more Problem (in fact, through PHP's apc extension, the native memory of the web server can be manipulated through Key/value). Therefore, the memory cache service we choose to build must also be an independent service.

The choice of memory cache mainly includes redis/memcache. In terms of performance, there is not much difference between the two. In terms of feature richness, Redis is superior.

3. Memory cache cluster

When we build a single memory cache, we will face the problem of single point of failure, so we must turn it into a cluster. The simple way is to add a slave as a backup machine. However, what if there are really a lot of requests and we find that the cache hit rate is not high and more machine memory is needed? Therefore, we recommend configuring it as a cluster. For example, similar to redis cluster.

Redis cluster The Redis in the cluster are multiple sets of masters and slaves. At the same time, each node can accept requests, which is more convenient when expanding the cluster. The client can send a request to any node, and if it is the content it is "responsible for", the content will be returned directly. Otherwise, find the actual responsible Redis node, then inform the client of the address, and the client requests again.

All this is transparent to clients using the cache service.

There are certain risks when switching the memory cache service. In the process of switching from cluster A to cluster B, it is necessary to ensure that cluster B is "warmed up" in advance (the hot data in the memory of cluster B should be the same as that of cluster A as much as possible, otherwise, a large number of content requests will be requested at the moment of switching. It cannot be found in the memory cache of cluster B. The traffic directly impacts the back-end database service, which is likely to cause database downtime).

4. Reduce database “writes”

The above mechanisms all achieve the reduction of database “read” operations, but the write operation is also a big pressure. Although the write operation cannot be reduced, it can reduce the pressure by merging requests. At this time, we need to establish a modification synchronization mechanism between the memory cache cluster and the database cluster.

First put the modification request into effect in the cache, so that external queries can display normally, and then put these SQL modifications into a queue and store them. When the queue is full or every once in a while, they are merged into one request and sent to the database. Update the database.

In addition to improving the writing performance by changing the system architecture mentioned above, MySQL itself can also adjust the writing strategy to the disk by configuring the parameter innodb_flush_log_at_trx_commit. If the machine cost allows, to solve the problem from the hardware level, you can choose the older RAID (Redundant Arrays of independent Disks, disk array) or the newer SSD (Solid State Drives, solid state drives).

5. NoSQL storage

Regardless of whether the database is read or written, when the traffic increases further, the scenario of "when manpower is limited" will eventually be reached. The cost of adding more machines is relatively high and may not really solve the problem. At this time, you can consider using NoSQL database for some core data. Most NoSQL storage uses the key-value method. It is recommended to use Redis as introduced above. Redis itself is a memory cache and can also be used as a storage, allowing it to directly store data on the disk.

In this case, we will separate some of the frequently read and written data in the database and put it in our newly built Redis storage cluster, which will further reduce the pressure on the original MySQL database. At the same time, because Redis itself is a memory level Cache, the performance of reading and writing will be greatly improved.

Domestic first-tier Internet companies adopt many solutions similar to the above solutions in terms of architecture. However, the cache service used is not necessarily Redis. They will have richer other options, and even based on Develop its own NoSQL service based on its own business characteristics.

6. Empty node query problem

When we have built all the services mentioned above and think that the Web system is already very strong. We still say the same thing, new problems will still come. Empty node queries refer to data requests that do not exist in the database at all. For example, if I request to query a person's information that does not exist, the system will search from the cache at all levels step by step, and finally find the database itself, and then draw the conclusion that it cannot be found, and return it to the front end. Because caches at all levels are invalid for it, this request consumes a lot of system resources, and if a large number of empty node queries are made, it can impact system services.

The above is the detailed content of Establishment and optimization of caching mechanism for website system. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:hcoder. If there is any infringement, please contact admin@php.cn delete
Golang中实现高效电商推荐算法的缓存机制。Golang中实现高效电商推荐算法的缓存机制。Jun 20, 2023 pm 08:33 PM

随着电商业务的蓬勃发展,推荐算法成为了各大电商平台竞争的关键之一。作为一门高效、高性能语言,Golang在实现电商推荐算法方面有着很大的优势。但是,在实现高效推荐算法的同时,缓存机制也是一个不可忽视的问题。本文将介绍如何在Golang中实现高效电商推荐算法的缓存机制。一、为什么需要缓存机制在电商推荐算法中,推荐结果的生成需要耗费大量的计算资源,对于高并发的电

Django框架中的缓存机制详解Django框架中的缓存机制详解Jun 18, 2023 pm 01:14 PM

在Web应用程序中,缓存通常是用来优化性能的重要手段。Django作为一款著名的Web框架,自然也提供了完善的缓存机制来帮助开发者进一步提高应用程序的性能。本文将对Django框架中的缓存机制进行详解,包括缓存的使用场景、建议的缓存策略、缓存的实现方式和使用方法等方面。希望对Django开发者或对缓存机制感兴趣的读者有所帮助。一、缓存的使用场景缓存的使用场景

java缓存机制有哪些java缓存机制有哪些Nov 16, 2023 am 11:21 AM

java缓存机制有内存缓存、数据结构缓存、缓存框架、分布式缓存、缓存策略、缓存同步、缓存失效机制以及压缩和编码等。详细介绍:1、内存缓存,Java的内存管理机制会自动缓存经常使用的对象,以减少内存分配和垃圾回收的开销;2、数据结构缓存,Java内置的数据结构,如HashMap、LinkedList、HashSet等,具有高效的缓存机制,这些数据结构使用内部哈希表来存储元素等等。

阿里云缓存机制有哪些阿里云缓存机制有哪些Nov 15, 2023 am 11:22 AM

阿里云缓存机制有阿里云Redis、阿里云Memcache、分布式缓存服务DSC、阿里云Table Store、CDN等。详细介绍:1、阿里云Redis:阿里云提供的分布式内存数据库,支持高速读写和数据持久化。通过将数据存储在内存中,可以提供低延迟的数据访问和高并发的处理能力;2、阿里云Memcache:阿里云提供的高速缓存系统等等。

浏览器缓存机制有哪些浏览器缓存机制有哪些Nov 15, 2023 pm 03:25 PM

浏览器缓存机制有强缓存、协商缓存、Service Worker和IndexedDB等。详细介绍:1、强缓存,浏览器在请求资源时,会先检查本地缓存是否存在该资源的副本,并且该副本是否过期,如果资源的副本未过期,浏览器就直接使用本地缓存,不会向服务器发送请求,从而加快了网页加载速度;2、协商缓存,当资源的副本过期或者浏览器的缓存被清除时,浏览器会向服务器发送请求等等。

Golang中实现高效在线广告投放算法的缓存机制。Golang中实现高效在线广告投放算法的缓存机制。Jun 21, 2023 am 08:42 AM

Golang作为一门高效的编程语言,近年来受到越来越多开发者的欢迎,并在各种场景下被广泛应用。在广告平台场景中,为了实现精准的广告投放,需要对广告的选择、排序、过滤等流程进行快速的计算,以达到高效的广告投放目的。而为了优化这个流程,缓存机制成为了不可避免的一部分。一般而言,广告平台的流程大概如下:当用户在浏览网页时,广告平台通过各种方式收集到用户的信息,并通

html缓存机制有哪些html缓存机制有哪些Nov 15, 2023 pm 05:58 PM

html缓存机制有浏览器缓存、缓存HTTP头、Expires、ETag、Last-Modified等。详细介绍:1、浏览器缓存,是一种基于浏览器的缓存机制,它将之前访问过的网页内容存储在用户的计算机上,以便在下次访问时能够更快地加载和显示网页内容;2、缓存HTTP头,是HTTP/1.1规范中的一种缓存机制,它通过设置响应头来控制浏览器对资源的缓存行为;3、Expires等等。

http缓存机制有哪些http缓存机制有哪些Nov 16, 2023 am 10:48 AM

http缓存机制有缓存头、缓存策略、缓存命中、缓存失效、缓存回收、缓存一致性、缓存替换策略、代理缓存、浏览器缓存、压缩和编码、CDN缓存等。详细介绍:1、缓存头,是HTTP请求和响应中包含的元数据,用于控制缓存的行为;2、缓存策略,Web服务器使用缓存策略来确定如何处理缓存请求;3、缓存命中,当浏览器再次请求相同的资源时,如果该资源已经在缓存中可用,则浏览器会直接从缓存中等等。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.