The company has recently been engaged in service separation and data segmentation, because the amount of data in a single package table is really too large, and It is still growing at 60W per day.
I have learned about database sub-databases and sub-tables before, and have read a few blog posts, but I only know a vague concept, and now that I think about it, everything is vague.
I spent the whole afternoon reading database sub-tables and reading a lot of articles. Now I will make a summary:
Part 1: Problems faced in the actual website development process.
Part 2: What are the different ways of segmentation, the differences and applicable aspects between vertical and horizontal.
Part 3: Some open source products and technologies currently on the market, and what are their advantages and disadvantages.
Part 4: Perhaps the most important thing, why is it not recommended to split databases and tables horizontally! ? This allows you to treat it carefully in the early stages of planning and avoid problems caused by segmentation.
Explanation of terms
Library: database; table: table; sub-database and sub-table: sharding
The database architecture has just evolved At first, it was enough for us to use a single-machine database. Later, when faced with more and more requests, we separated the write operations and read operations of the database, using multiple slave database copies (Slaver Replication) to be responsible for reading, and using the master database (Master ) is responsible for writing, and the slave library updates data synchronously from the main library to keep the data consistent. Architecturally, it is database master-slave synchronization. The slave library can be scaled horizontally, so more read requests are not a problem.
But when the number of users increases and there are more and more write requests, what should we do? Adding a Master cannot solve the problem, because the data needs to be consistent and the write operation requires synchronization between the two masters, which is equivalent to duplication and is more complicated.
At this time, you need to use sharding to split the write operations.
Any problem is too big or too small. The amount of data we face here Too big a problem.
Because single server TPS, memory, and IO are limited.
Solution: Distribute requests to multiple servers; in fact, user requests and executing a SQL query are essentially the same, both requesting a resource, but user requests will also go through gateways, routing, http servers, etc. .
The processing capacity of a single database is limited;
Insufficient disk space on the server where the single database is located;
Operation on a single database IO bottleneck
Solution: Split into more smaller libraries
CRUD is a problem;
Index expansion, query timeout
Solution: Split into multiple tables with smaller data sets.
Generally, vertical slicing and horizontal slicing are used, which is a result set description The way of segmentation is physical space segmentation.
We start from the problems we face and solve them.
Explanation:
First of all, the number of user requests is too large, so we pile up machines to handle it (this is not the focus of this article)
Then the single library is too large. At this time, we need to see why Too many tables lead to too much data, or because there is too much data in a single table.
If there are many tables and a lot of data, use vertical segmentation to divide it into different libraries according to the business.
If the amount of data in a single table is too large, horizontal segmentation must be used, that is, the data in the table is divided into multiple tables according to certain rules, or even multiple tables on multiple databases.
The order of database and table partitioning should be vertical partitioning first, and then horizontal partitioning
. Because vertical division is simpler and more consistent with the way we deal with real-world problems.
That is, "split a large table into a small table", which is based on column fields. Generally, there are many fields in the table, and those that are not commonly used, have large data, and are long in length (such as text type fields) are split into "extended tables". It is generally aimed at large tables with hundreds of columns, and also avoids the "cross-page" problem caused by too much data when querying.
Vertical sub-library is aimed at splitting different businesses in a system, such as a database for users, a database for products, and a database for orders. After splitting, it should be placed on multiple servers instead of one server. Why? Let's imagine that a shopping website provides services to the outside world and has CRUD for users, products, orders, etc. Before the split, everything fell into a single database, which would make the database's single database processing capability become a bottleneck
. After dividing the database vertically, if it is still placed on a database server, as the number of users increases, the processing power of a single database will become a bottleneck, and the disk space, memory, TPS, etc. of a single server will be very tight. . Therefore, we need to split it into multiple servers, so that the above problems are solved and we will not face single-machine resource problems in the future.
governance and
degradation mechanism of the service. It can also manage, maintain and monitor the data of different businesses separately. Extensions etc. The database is often the most likely to become the bottleneck of the application system, and the database itself is
stateful. Compared with the Web and application servers, it is more difficult to achieve
horizontal expansion. Database connection resources are precious and single-machine processing capabilities are limited. In high-concurrency scenarios, vertical sub-databases can break through the bottlenecks of IO, number of connections, and single-machine hardware resources to a certain extent.
For a single table with a huge amount of data (such as an order table), according to a certain Rules (RANGE, HASH modulus
, etc.) are divided into multiple tables. However, these tables are still in the same library, so database operations at the library level still have IO bottlenecks
. Not recommended.
Split the data of a single table into multiple servers. Each server has a corresponding library and table, but the data collection in the table is different. Horizontal sub-database and sub-table can effectively alleviate the performance bottlenecks and pressures of single machines and single databases, and break through the bottlenecks of IO, number of connections, hardware resources, etc.
RANGE
One table from 0 to 10000, one table from 10001 to 20000;
HASH model
A shopping mall system generally uses users and orders as the main table, and then uses the related tables as supplementary tables , this will not cause problems such as cross-database transactions. Get the user ID, then take the modulus of hash and distribute it to different databases.
Geographic Region
For example, if we divide our business according to East China, South China, and North China, Qiniu Cloud should be like this.
Time
Split by time, that is, cut out the data 6 months ago or even a year ago and put it in another Tables, because as time goes by, the probability of data in these tables being queried becomes smaller, so there is no need to put them together with "hot data". This is also "separation of hot and cold data".
Sub-databases and tables After that, it became distributed transaction
.
If you rely on the distributed transaction management function of the database itself to execute transactions, you will pay a high performance price; If the application assists in controlling it, forming a program logic transaction, it will also cause programming problems burden.
Similar to group by, order by
Such grouping and sorting statements cannot be used
After the database is divided into tables, the association operations between the tables will be restricted. We cannot join tables located in different databases, nor can we join tables with different granularity. The result is originally The business that can be completed with one query may require multiple queries to complete. Rough solution: global table: basic data, all libraries have a copy. Field redundancy: In this way, some fields do not need to be queried by join. System layer assembly: Query everything separately and then assemble it, which is more complicated.
#There are relatively many sub-database and sub-table middleware on the market, among which those based on the proxy method MySQL Proxy
and Amoeba
, based on Hibernate framework is Hibernate Shards
, based on jdbc is Dangdangsharding-jdbc
, based on mybatis similar maven plug-in There are Mogujie’s MogujieTSharding
, and Cobar Client
by rewriting spring’s ibatis template class.
There are also open source products from some big companies:
I am programmer Qingge
, a person who loves life and A post-90s programmer who loves to share.
This issue’s introduction and solutions about Mysql sub-database and sub-table are introduced here. I hope it can help everyone. Please continue to pay attention to the public account Java learning for more Java interview articles in the future. guide
.
The above is the detailed content of Today I finally figured out the MySQL sub-database and sub-tables, so I can brag about it in the interview!. For more information, please follow other related articles on the PHP Chinese website!