Home  >  Article  >  Backend Development  >  Detailed explanation of how PHP optimizes MySQL tens of millions of tables

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

藏色散人
藏色散人forward
2020-01-25 14:16:013106browse

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

First of all, using Mysql to store hundreds of billions of data is indeed a very big challenge. A single Mysql table can indeed store 1 billion levels of data, but the performance is very poor at this time. A large number of experiments in the project have proven that the capacity of a single Mysql table is around 5 million, and the performance is at its best.

Optimization of large tables is mainly solved through database sub-database and table sub-tables. There are currently three common solutions: Partition, Sub-database and sub-table, NoSql/NewSql. In actual projects, these three solutions are combined. At present, the core data of most systems are mainly RDBMS storage, supplemented by NoSql/NewSql storage.

Partition

First let’s understand the partition scheme.

Partitioned tables are implemented by multiple related underlying tables. These underlying tables are also represented by handle objects, so we can also directly access each partition. The storage engine manages the underlying tables of the partitions in the same way as it manages ordinary tables (all underlying tables must use the same storage engine). The index of the partition table is just Add an identical index to each underlying table. This solution shields users from the details of sharding. Even if the query conditions do not have a sharding column, it can still work normally (but the performance is average at this time).

However, its shortcomings are obvious: many resources are limited by a single machine, such as the number of connections, network throughput, etc. How to partition is one of the key elements in practical applications.

Let’s start with an example: Taking customer information as an example, the amount of customer data is 50 million plus. The project background requires saving the customer’s bank card binding relationship, the customer’s document binding relationship, and the customer’s binding business information.

Under this business background, how to design the database. During the first phase of the project, we established a customer business binding relationship table, which contains redundant business information bound to each customer.

The basic structure is roughly as follows:

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

When querying, index the bank card and the business number Index, document number is used as index. As demand increases, the indexes of this table will reach more than 10. Moreover, when the customer cancels the contract and then signs the contract again, two pieces of data will be saved, but the binding status is different.

Assuming we have 50 million customers, 5 business types, and an average of 2 cards per customer, then the amount of data in this table will reach an astonishing 500 million. In fact, the number of users of our system is still It won't work until it reaches one million. Such a design is absolutely not possible. Whether it is inserting or querying, it will crash the system.

The data in the mysql database is stored on the disk in the form of files. By default, it is placed under /mysql/data (can be viewed through the datadir in my.cnf). One table mainly corresponds to three One file is frm to store the table structure, one is myd to store the table data, and the other is myi to store the table index. These three files are very large, especially the .myd file, which is almost 5G. The first partition optimization is carried out below. There are four partitioning methods supported by Mysql:

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

In our project, range partitioning and list partitioning have no usage scenarios. If based on binding Customize the number to make a range or list partition. The binding number has no actual business meaning and cannot be queried through it. Therefore, we are left with HASH partition and KEY partition. HASH partition only supports partitions of int type columns, and is one of them. a row.

KEY partitioning can support multiple columns, but it also requires that one of the columns must be of type int. Looking at our library table structure, we find that none of the columns is of type int. How to partition? Add a column, the binding time column, set this column to the int type, and then partition it according to the binding time, and divide the users bound every day into the same area.

After this optimization, our insertion is much faster, but the query is still very slow. Why?

Because when making queries, we only query based on bank card or ID number, and not based on time. This is equivalent to every query, MySQL will query all partition tables.

Carry out the second solution optimization. Since HASH partitioning and KEY partitioning require that one of the columns must be of type int, is it possible to create a list of type int for partitioning?

Analysis found that there is a secret in the string of numbers on the bank card. Bank cards are generally a string of numbers ranging from 16 to 19 digits. Is it feasible for us to take one of the numbers and use it as a table partition? Through analysis, we found that in this string of numbers, one of them is indeed a random number from 0 to 9. Generated, we perform KEY partitioning based on the random digits of the bank card number. Every time we query, we intercept this random digit through calculation, add the card number, and perform a joint query to achieve the purpose of partition query. It should be noted that partitioning Finally, the index created must also be a partition column, otherwise Mysql will still query data in all partition tables.

The problem of querying the binding relationship through the bank card number has been solved. What about the certificate number? How to query the binding relationship through the certificate number.

As mentioned before, indexing must be done on the partition health, otherwise it will cause a full table scan. We created a new table to save the customer's ID number binding relationship. Each customer's ID number is unique. In the new ID number binding relationship table, the ID number is used as the primary key. So how to calculate the partition health? Well, the customer's document information is relatively complicated, including ID number, Hong Kong, Macao and Taiwan pass, motor vehicle driving license, etc. How to find the partition key in the disordered document number.

In order to solve this problem, we divide the document number binding relationship table into two. One of the tables is dedicated to saving ID number types, and the other table saves documents of other types. number, in the document binding relationship table of the ID card type, we split the month number in the ID number as the partition key, and save the ID number of the customer born in the same month in the same area, thus dividing it into 12 For each area, if the data volume of other certificate types does not exceed 100,000, there is no need to partition.

In this way, every time you query, first determine which table to query based on the certificate type, and then calculate the partition key for query. After the partition design, when saving 20 million user data, the data saving file of the bank card table was divided into 10 small files, and the data saving file of the certificate table was divided into 12 small files. This solved the two query problems and also One question remains: What to do with the business number?

A customer has multiple contracted services, how to save them? At this time, it is not appropriate to use a partitioning solution. It requires a table partitioning solution.

Sub-table

We mentioned earlier that for mysql, its data files are stored on the disk in the form of files. When a data file is too large, the operation of the large file by the operating system will be troublesome and time-consuming, and some operating systems do not support large files. At this time, the table must be divided.

In addition, the commonly used storage engine for mysql is Innodb, and its underlying data structure is a B-tree. When the data file is too large, querying a node may query many levels, which will inevitably lead to multiple IO operations to be loaded into the memory, which will definitely be time-consuming.

In addition, there is Innodb's locking mechanism for B-trees. By locking each node, when the table structure is changed, the tree will be locked. When the table file is large, this can be considered unachievable. So to sum up, we must perform the operations of sub-table and sub-database.

How to perform sub-database and sub-table, there are currently many versions on the Internet, some of the more well-known solutions: Alibaba's TDDL, DRDS and cobar, JD Finance's sharding-jdbc; private organizations' MyCAT; 360's Atlas ; Meituan’s Zebra; other companies such as NetEase, 58, JD.com and other companies have self-developed middleware.

So many sub-database and table middleware solutions can be summarized into two categories: client mode and proxy mode.

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

client mode

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

proxy mode

Whether it is client mode or proxy mode. Several core steps are the same: SQL parsing, rewriting, routing, execution, and result merging. Personally, I prefer to use the client mode. It has a simple architecture, relatively small performance loss, and low operation and maintenance costs.

How to divide the business types into databases and tables. The most important step in sharding databases and tables is the selection of sharding columns. The quality of sharding column selection will directly determine whether the entire database sharding and table sharding scheme is ultimately successful. The selection of sharding column is strongly related to the business.

In our project scenario, the best choice for sharding column is undoubtedly the business number. Through the business number, different binding contract services of the customer are saved in different tables, and are routed to the corresponding table for query according to the business number, so as to further optimize the SQL.

For more related php knowledge, please visit php tutorial!

The above is the detailed content of Detailed explanation of how PHP optimizes MySQL tens of millions of tables. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:cnblogs.com. If there is any infringement, please contact admin@php.cn delete