This article mainly describes the evolution of Mysql architecture under different concurrent access levels of the website
Scalability
The scalability of the architecture is often closely related to concurrency. Without the growth of concurrency, there is no need to be highly scalable. Based on the traditional architecture, here is a brief introduction to scalability. There are two commonly used expansion methods
Scale-up: Vertical expansion, by replacing with better machines and resources to achieve scaling and improve service capabilities
Scale -out: Horizontal expansion, by adding nodes (machines) to achieve scaling and improve service capabilities
For high-concurrency applications on the Internet, Scale out is undoubtedly the way out. Purchasing higher-end machines vertically has always been a taboo for us. The problem is not a long-term solution. Under the theory of scale out, what is the ideal state of scalability?
The ideal state of scalability
When a service faces higher concurrency, it can improve the concurrency supported by the service by simply adding machines, and the process of adding machines will have no impact on the online service (no down time) ), this is the ideal state of scalability!
Evolution of architecture
V1.0 Simple website architecture
The architecture behind a simple small website or application can be very simple. Only one mysql instance is needed for data storage to meet the data reading and writing needs (ignored here) Data backup instance), websites in this time period will generally store all information in a database instance.
Under this architecture, let’s take a look at what are the bottlenecks of data storage?
1. When the total size of the data cannot be accommodated in one machine
2. When the data index (B+ Tree) cannot be accommodated in the memory of one machine
3. The amount of access (mixed reading and writing) cannot be tolerated by one instance
Only when any one or more of the above three things are met, do we need to consider evolving to the next level. From this we can see that in fact, for many small companies and small applications, this architecture is enough to meet their needs. Accurate assessment of the initial data volume is an important step in preventing over-design. After all, no one wants to worry about the impossible. things and waste your experience.
Here is a simple example of mine. For tables like user information (3 indexes), 16G memory can hold an index of about 20 million rows of data. A simple mixed read and write access volume of about 3000/s is no problem. Your Is the application scenario
Vertical splitting of V2.0
Generally when V1.0 encounters a bottleneck, the first and easiest splitting method is vertical splitting. What is vertical? From a business perspective, data that is not strongly related is split into different instances to achieve the goal of eliminating bottlenecks. Taking the example in the figure, user information data and business data are split into three different instances. For scenarios where there are many repeated read types, we can also add a layer of cache to reduce the pressure on the DB.
Under this architecture, let’s take a look at what are the bottlenecks of data storage?
1. Single instance single business still has bottlenecks described in V1.0
When encountering bottlenecks, you can consider upgrading to a higher V version of this article. If read requests cause performance bottlenecks, you can consider upgrading to V3.0. Other bottlenecks should be considered Upgrade to V4.0
V3.0 Master-slave architecture
This type of architecture mainly solves the reading problem under the V2.0 architecture. It migrates the reading pressure by attaching real-time data backup to the Instance. In the Mysql scenario, it is through the master-slave structure. The master library resists write pressure and shares the read pressure through the slave library. For applications that write less and read more, the V3.0 master-slave architecture is fully capable
Under such an architecture, let’s take a look at the bottlenecks of data storage What is it?
1. The main library cannot bear the amount of writing
V4.0 Horizontal Split
When the V2.0 V3.0 solution encounters bottlenecks, it can be solved by horizontal splitting, horizontal splitting and vertical splitting There is a big difference. The result of vertical splitting is that one instance has the full amount of data, but after horizontal splitting, any instance only has 1/n of the full amount of data. The following figure shows the splitting of Userinfo as an example. Split the userinfo into 3 clusters. Each cluster holds 1/3 of the total data. The sum of the 3 cluster data is equal to a complete data (note: it is no longer called a single instance but a cluster to represent the main data). From a small mysql cluster)
How is the data routed?
1.Range split
Sharding key is routed according to continuous interval segments. It is generally used in scenarios with strict auto-increment ID requirements, such as Userid, a small example of Userid Range: Split using userid 3000W as Range No. 1 cluster userid 1-3000W Cluster No. 2 userid 3001W-6000W
2.List splitting
List splitting has the same idea as Range splitting, both routing to different clusters by giving different sharding keys, but the specific methods are somewhat different , List is mainly used for situations where the sharding key is not a continuous sequence and falls into a cluster