Home  >  Article  >  Database  >  Today I finally figured out the MySQL sub-database and sub-tables, so I can brag about it in the interview!

Today I finally figured out the MySQL sub-database and sub-tables, so I can brag about it in the interview!

Java学习指南
Java学习指南forward
2023-07-26 15:02:561045browse

Preface

The company has recently been engaged in service separation and data segmentation, because the amount of data in a single package table is really too large, and It is still growing at 60W per day.

I have learned about database sub-databases and sub-tables before, and have read a few blog posts, but I only know a vague concept, and now that I think about it, everything is vague.

I spent the whole afternoon reading database sub-tables and reading a lot of articles. Now I will make a summary:

Part 1: Problems faced in the actual website development process.

Part 2: What are the different ways of segmentation, the differences and applicable aspects between vertical and horizontal.

Part 3: Some open source products and technologies currently on the market, and what are their advantages and disadvantages.

Part 4: Perhaps the most important thing, why is it not recommended to split databases and tables horizontally! ? This allows you to treat it carefully in the early stages of planning and avoid problems caused by segmentation.

Explanation of terms

Library: database; table: table; sub-database and sub-table: sharding

The database architecture has just evolved At first, it was enough for us to use a single-machine database. Later, when faced with more and more requests, we separated the write operations and read operations of the database, using multiple slave database copies (Slaver Replication) to be responsible for reading, and using the master database (Master ) is responsible for writing, and the slave library updates data synchronously from the main library to keep the data consistent. Architecturally, it is database master-slave synchronization. The slave library can be scaled horizontally, so more read requests are not a problem.

But when the number of users increases and there are more and more write requests, what should we do? Adding a Master cannot solve the problem, because the data needs to be consistent and the write operation requires synchronization between the two masters, which is equivalent to duplication and is more complicated.

At this time, you need to use sharding to split the write operations.

Problems before sharding databases and tables

Any problem is too big or too small. The amount of data we face here Too big a problem.

The volume of user requests is too large

Because single server TPS, memory, and IO are limited.

Solution: Distribute requests to multiple servers; in fact, user requests and executing a SQL query are essentially the same, both requesting a resource, but user requests will also go through gateways, routing, http servers, etc. .

The single database is too large

The processing capacity of a single database is limited;

Insufficient disk space on the server where the single database is located;

Operation on a single database IO bottleneck

Solution: Split into more smaller libraries

A single table is too large

CRUD is a problem;

Index expansion, query timeout

Solution: Split into multiple tables with smaller data sets.

Methods for sharding databases and tables

Generally, vertical slicing and horizontal slicing are used, which is a result set description The way of segmentation is physical space segmentation.

We start from the problems we face and solve them.

Explanation:
First of all, the number of user requests is too large, so we pile up machines to handle it (this is not the focus of this article)
Then the single library is too large. At this time, we need to see why Too many tables lead to too much data, or because there is too much data in a single table.
If there are many tables and a lot of data, use vertical segmentation to divide it into different libraries according to the business.
If the amount of data in a single table is too large, horizontal segmentation must be used, that is, the data in the table is divided into multiple tables according to certain rules, or even multiple tables on multiple databases.

The order of database and table partitioning should be vertical partitioning first, and then horizontal partitioning. Because vertical division is simpler and more consistent with the way we deal with real-world problems.

Vertical split

Vertical splitting

That is, "split a large table into a small table", which is based on column fields. Generally, there are many fields in the table, and those that are not commonly used, have large data, and are long in length (such as text type fields) are split into "extended tables". It is generally aimed at large tables with hundreds of columns, and also avoids the "cross-page" problem caused by too much data when querying.

Vertical sub-library

Vertical sub-library is aimed at splitting different businesses in a system, such as a database for users, a database for products, and a database for orders. After splitting, it should be placed on multiple servers instead of one server. Why? Let's imagine that a shopping website provides services to the outside world and has CRUD for users, products, orders, etc. Before the split, everything fell into a single database, which would make the database's single database processing capability become a bottleneck. After dividing the database vertically, if it is still placed on a database server, as the number of users increases, the processing power of a single database will become a bottleneck, and the disk space, memory, TPS, etc. of a single server will be very tight. . Therefore, we need to split it into multiple servers, so that the above problems are solved and we will not face single-machine resource problems in the future.

The splitting of the database business level is similar to the

governance and degradation mechanism of the service. It can also manage, maintain and monitor the data of different businesses separately. Extensions etc. The database is often the most likely to become the bottleneck of the application system, and the database itself is stateful. Compared with the Web and application servers, it is more difficult to achieve horizontal expansion. Database connection resources are precious and single-machine processing capabilities are limited. In high-concurrency scenarios, vertical sub-databases can break through the bottlenecks of IO, number of connections, and single-machine hardware resources to a certain extent.

Horizontal split

Horizontal split table

For a single table with a huge amount of data (such as an order table), according to a certain Rules (RANGE, HASH modulus , etc.) are divided into multiple tables. However, these tables are still in the same library, so database operations at the library level still have IO bottlenecks. Not recommended.

Horizontal database and table partitioning

Split the data of a single table into multiple servers. Each server has a corresponding library and table, but the data collection in the table is different. Horizontal sub-database and sub-table can effectively alleviate the performance bottlenecks and pressures of single machines and single databases, and break through the bottlenecks of IO, number of connections, hardware resources, etc.

Horizontal database sharding and table sharding rules

  • RANGE

    One table from 0 to 10000, one table from 10001 to 20000;

  • HASH model

    A shopping mall system generally uses users and orders as the main table, and then uses the related tables as supplementary tables , this will not cause problems such as cross-database transactions. Get the user ID, then take the modulus of hash and distribute it to different databases.

  • Geographic Region

    For example, if we divide our business according to East China, South China, and North China, Qiniu Cloud should be like this.

  • Time

    Split by time, that is, cut out the data 6 months ago or even a year ago and put it in another Tables, because as time goes by, the probability of data in these tables being queried becomes smaller, so there is no need to put them together with "hot data". This is also "separation of hot and cold data".

Problems faced after sharding databases and tables

Transaction support

Sub-databases and tables After that, it became distributed transaction.

If you rely on the distributed transaction management function of the database itself to execute transactions, you will pay a high performance price; If the application assists in controlling it, forming a program logic transaction, it will also cause programming problems burden.

Multiple database result set merging (group by, order by)

Similar to group by, order bySuch grouping and sorting statements cannot be used

Cross-database join

After the database is divided into tables, the association operations between the tables will be restricted. We cannot join tables located in different databases, nor can we join tables with different granularity. The result is originally The business that can be completed with one query may require multiple queries to complete. Rough solution: global table: basic data, all libraries have a copy. Field redundancy: In this way, some fields do not need to be queried by join. System layer assembly: Query everything separately and then assemble it, which is more complicated.

Sub-database and sub-table solution products

#There are relatively many sub-database and sub-table middleware on the market, among which those based on the proxy method MySQL Proxy and Amoeba, based on Hibernate framework is Hibernate Shards, based on jdbc is Dangdangsharding-jdbc, based on mybatis similar maven plug-in There are Mogujie’s MogujieTSharding, and Cobar Client by rewriting spring’s ibatis template class.

There are also open source products from some big companies:

Today I finally figured out the MySQL sub-database and sub-tables, so I can brag about it in the interview!


I am programmer Qingge, a person who loves life and A post-90s programmer who loves to share.


This issue’s introduction and solutions about Mysql sub-database and sub-table are introduced here. I hope it can help everyone. Please continue to pay attention to the public account Java learning for more Java interview articles in the future. guide.

The above is the detailed content of Today I finally figured out the MySQL sub-database and sub-tables, so I can brag about it in the interview!. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:Java学习指南. If there is any infringement, please contact admin@php.cn delete