Home >Database >Mysql Tutorial >Summary of MYSQL database data splitting into databases and tables_MySQL
Data storage evolution idea 1: single database and single table
Single database and single table is the most common database design. For example, there is a user table in the database db, and all users can be found in the user table in the db database.
Data storage evolution idea 2: single database with multiple tables
As the number of users increases, the data volume of the user table will become larger and larger. When the data volume reaches a certain level, the query of the user table will gradually slow down, thus affecting the performance of the entire DB. If you use mysql, a more serious problem is that when a column needs to be added, mysql will lock the table, during which all read and write operations can only wait.
Users can be split horizontally in some way to generate two tables with exactly the same structure: user_0000, user_0001, etc. The data of user_0000 + user_0001 + ... is just a complete set of data.
Data storage evolution idea three: multiple databases and multiple tables
As the amount of data increases, the storage space of a single DB may not be enough, and as the number of queries increases, a single database server can no longer support it. At this time, the database can be differentiated horizontally.
Mysql database sharding rules
When designing a table, you need to determine the rules according to which the table will be divided into databases and tables. For example, when there is a new user, the program must determine which table to add the user information to; similarly, when logging in, we must find the corresponding record in the database through the user's account, all of which need to follow certain rules. conduct.
Routing
The process of finding the corresponding tables and libraries through the sharding and sharding rules. For example, the rule for sharding databases and tables is user_id mod 4. When a user registers a new account with an account ID of 123, we can use id mod 4 to determine that this account should be saved in the User_0003 table. When user 123 logs in, we pass 123 mod 4 and determine that it is recorded in User_0003.
The following are the problems caused by sub-database and sub-table, and the precautions
1. Problems with the dimensions of sub-databases and sub-tables
If a user purchases a product, the transaction record needs to be saved and retrieved. If the table is divided according to the user's latitude, each user's transaction record will be saved in the same table, so it is quick and convenient to find a user's purchase status. , but the purchase status of a certain product is likely to be distributed in multiple tables, making it troublesome to search. On the contrary, if you divide the table according to the product dimension, you can easily find the purchase status of this product, but it is more troublesome to find the buyer's transaction records.
So common solutions are:
a. Solve it by scanning the meter. This method is basically impossible and the efficiency is too low.
b. Record two pieces of data, one divided into tables according to user dimensions, and one divided into tables according to product dimensions.
c. Solve it through search engines, but if the real-time requirements are very high, it must be related to real-time search.
2. Problem with joint query
Union query is basically impossible because the related tables may not be in the same database.
3. Avoid cross-database transactions
Avoid modifying the table in db1 while modifying the table in db0 in a transaction. One is that the operation is more complicated and the efficiency will be affected to a certain extent.
4. Try to put the same set of data on the same DB server
For example, if seller A’s products and transaction information are placed in db0, when db1 is down, seller A’s related things can be used normally. This means preventing data in a database from relying on data in another database.
One master, many backups
In actual applications, in most cases, reading is much greater than writing. Mysql provides a read-write separation mechanism. All write operations must correspond to the Master. Read operations can be performed on the Master and Slave machines. The structures of Slave and Master are exactly the same. A Master can have multiple Slaves, and even one Slave. You can attach Slave, which can effectively improve the QPS of the DB cluster.
All write operations are performed on the Master first, and then synchronized to the Slave. Therefore, there is a certain delay in synchronizing from the Master to the Slave machine. When the system is very busy, the delay problem will be more serious and the number of Slave machines increases. It will also make the problem worse.In addition, it can be seen that the Master is the bottleneck of the cluster. When there are too many write operations, it will seriously affect the stability of the Master. If the Master hangs up, the entire cluster will not work properly.
So, 1. When the reading pressure is very high, you can consider adding Slave machines to solve the problem. However, when the number of Slave machines reaches a certain number, you have to consider sub-library. 2. When the writing pressure is very high, you must perform database sub-operation.
Why does MySQL need to be divided into databases and tables?
It can be said that where MySQL is used, as long as the amount of data is large, you will immediately encounter a problem, which requires splitting databases and tables.
Here is a question: Why do we need to divide databases and tables? Can’t MySQL handle large tables?
In fact, it can handle large tables. In the projects I have experienced, the physical file size of a single table is more than 80G, and the number of records in a single table is more than 500 million, and this table
It belongs to a very useful table: friend relationship table.
But this method can be said to be not the best method. Because file systems such as Ext3 file systems also have many problems in handling larger than large files.
This level can be replaced by the xfs file system. However, there is a problem that is difficult to solve when the MySQL single table is too large: the operation base related to table structure adjustment
This is not possible. Therefore, major projects will supervise the application of sub-databases and sub-tables during use.
From Innodb itself, there are only two locks on the Btree of the data file, the leaf node lock and the child node lock. You can imagine that when a page split or addition occurs
New leaves will cause data to be unable to be written into the table.
Therefore, sub-database and sub-table are still a better choice.
So how many sub-databases and tables are appropriate?
After testing 10 million records in a single table, the writing and reading performance is relatively good. In this way, leaving some buffer, then all data fonts in the single table are maintained at
The number of records is below 8 million, and the single table with character type is kept below 5 million.
If planned according to 100 databases and 100 tables, such as user business:
5 million*100*100 = 500,000,000 = 500 billion records.
Now that I have a number in mind, it is relatively easy to plan according to business.