


Today I finally figured out the MySQL sub-database and sub-tables, so I can brag about it in the interview!
Preface
The company has recently been engaged in service separation and data segmentation, because the amount of data in a single package table is really too large, and It is still growing at 60W per day.
I have learned about database sub-databases and sub-tables before, and have read a few blog posts, but I only know a vague concept, and now that I think about it, everything is vague.
I spent the whole afternoon reading database sub-tables and reading a lot of articles. Now I will make a summary:
Part 1: Problems faced in the actual website development process.
Part 2: What are the different ways of segmentation, the differences and applicable aspects between vertical and horizontal.
Part 3: Some open source products and technologies currently on the market, and what are their advantages and disadvantages.
Part 4: Perhaps the most important thing, why is it not recommended to split databases and tables horizontally! ? This allows you to treat it carefully in the early stages of planning and avoid problems caused by segmentation.
Explanation of terms
Library: database; table: table; sub-database and sub-table: sharding
The database architecture has just evolved At first, it was enough for us to use a single-machine database. Later, when faced with more and more requests, we separated the write operations and read operations of the database, using multiple slave database copies (Slaver Replication) to be responsible for reading, and using the master database (Master ) is responsible for writing, and the slave library updates data synchronously from the main library to keep the data consistent. Architecturally, it is database master-slave synchronization. The slave library can be scaled horizontally, so more read requests are not a problem.
But when the number of users increases and there are more and more write requests, what should we do? Adding a Master cannot solve the problem, because the data needs to be consistent and the write operation requires synchronization between the two masters, which is equivalent to duplication and is more complicated.
At this time, you need to use sharding to split the write operations.
Problems before sharding databases and tables
Any problem is too big or too small. The amount of data we face here Too big a problem.
The volume of user requests is too large
Because single server TPS, memory, and IO are limited.
Solution: Distribute requests to multiple servers; in fact, user requests and executing a SQL query are essentially the same, both requesting a resource, but user requests will also go through gateways, routing, http servers, etc. .
The single database is too large
The processing capacity of a single database is limited;
Insufficient disk space on the server where the single database is located;
Operation on a single database IO bottleneck
Solution: Split into more smaller libraries
A single table is too large
CRUD is a problem;
Index expansion, query timeout
Solution: Split into multiple tables with smaller data sets.
Methods for sharding databases and tables
Generally, vertical slicing and horizontal slicing are used, which is a result set description The way of segmentation is physical space segmentation.
We start from the problems we face and solve them.
Explanation:
First of all, the number of user requests is too large, so we pile up machines to handle it (this is not the focus of this article)
Then the single library is too large. At this time, we need to see why Too many tables lead to too much data, or because there is too much data in a single table.
If there are many tables and a lot of data, use vertical segmentation to divide it into different libraries according to the business.
If the amount of data in a single table is too large, horizontal segmentation must be used, that is, the data in the table is divided into multiple tables according to certain rules, or even multiple tables on multiple databases.
The order of database and table partitioning should be vertical partitioning first, and then horizontal partitioning
. Because vertical division is simpler and more consistent with the way we deal with real-world problems.
Vertical split
Vertical splitting
That is, "split a large table into a small table", which is based on column fields. Generally, there are many fields in the table, and those that are not commonly used, have large data, and are long in length (such as text type fields) are split into "extended tables". It is generally aimed at large tables with hundreds of columns, and also avoids the "cross-page" problem caused by too much data when querying.
Vertical sub-library
Vertical sub-library is aimed at splitting different businesses in a system, such as a database for users, a database for products, and a database for orders. After splitting, it should be placed on multiple servers instead of one server. Why? Let's imagine that a shopping website provides services to the outside world and has CRUD for users, products, orders, etc. Before the split, everything fell into a single database, which would make the database's single database processing capability become a bottleneck
. After dividing the database vertically, if it is still placed on a database server, as the number of users increases, the processing power of a single database will become a bottleneck, and the disk space, memory, TPS, etc. of a single server will be very tight. . Therefore, we need to split it into multiple servers, so that the above problems are solved and we will not face single-machine resource problems in the future.
governance and
degradation mechanism of the service. It can also manage, maintain and monitor the data of different businesses separately. Extensions etc. The database is often the most likely to become the bottleneck of the application system, and the database itself is
stateful. Compared with the Web and application servers, it is more difficult to achieve
horizontal expansion. Database connection resources are precious and single-machine processing capabilities are limited. In high-concurrency scenarios, vertical sub-databases can break through the bottlenecks of IO, number of connections, and single-machine hardware resources to a certain extent.
Horizontal split
Horizontal split table
For a single table with a huge amount of data (such as an order table), according to a certain Rules (RANGE, HASH modulus
, etc.) are divided into multiple tables. However, these tables are still in the same library, so database operations at the library level still have IO bottlenecks
. Not recommended.
Horizontal database and table partitioning
Split the data of a single table into multiple servers. Each server has a corresponding library and table, but the data collection in the table is different. Horizontal sub-database and sub-table can effectively alleviate the performance bottlenecks and pressures of single machines and single databases, and break through the bottlenecks of IO, number of connections, hardware resources, etc.
Horizontal database sharding and table sharding rules
RANGE
One table from 0 to 10000, one table from 10001 to 20000;
HASH model
A shopping mall system generally uses users and orders as the main table, and then uses the related tables as supplementary tables , this will not cause problems such as cross-database transactions. Get the user ID, then take the modulus of hash and distribute it to different databases.
Geographic Region
For example, if we divide our business according to East China, South China, and North China, Qiniu Cloud should be like this.
Time
Split by time, that is, cut out the data 6 months ago or even a year ago and put it in another Tables, because as time goes by, the probability of data in these tables being queried becomes smaller, so there is no need to put them together with "hot data". This is also "separation of hot and cold data".
Problems faced after sharding databases and tables
Transaction support
Sub-databases and tables After that, it became distributed transaction
.
If you rely on the distributed transaction management function of the database itself to execute transactions, you will pay a high performance price; If the application assists in controlling it, forming a program logic transaction, it will also cause programming problems burden.
Multiple database result set merging (group by, order by)
Similar to group by, order by
Such grouping and sorting statements cannot be used
Cross-database join
After the database is divided into tables, the association operations between the tables will be restricted. We cannot join tables located in different databases, nor can we join tables with different granularity. The result is originally The business that can be completed with one query may require multiple queries to complete. Rough solution: global table: basic data, all libraries have a copy. Field redundancy: In this way, some fields do not need to be queried by join. System layer assembly: Query everything separately and then assemble it, which is more complicated.
Sub-database and sub-table solution products
#There are relatively many sub-database and sub-table middleware on the market, among which those based on the proxy method MySQL Proxy
and Amoeba
, based on Hibernate framework is Hibernate Shards
, based on jdbc is Dangdangsharding-jdbc
, based on mybatis similar maven plug-in There are Mogujie’s MogujieTSharding
, and Cobar Client
by rewriting spring’s ibatis template class.
There are also open source products from some big companies:
I am programmer Qingge
, a person who loves life and A post-90s programmer who loves to share.
This issue’s introduction and solutions about Mysql sub-database and sub-table are introduced here. I hope it can help everyone. Please continue to pay attention to the public account Java learning for more Java interview articles in the future. guide
.
The above is the detailed content of Today I finally figured out the MySQL sub-database and sub-tables, so I can brag about it in the interview!. For more information, please follow other related articles on the PHP Chinese website!

InnoDB uses redologs and undologs to ensure data consistency and reliability. 1.redologs record data page modification to ensure crash recovery and transaction persistence. 2.undologs records the original data value and supports transaction rollback and MVCC.

Key metrics for EXPLAIN commands include type, key, rows, and Extra. 1) The type reflects the access type of the query. The higher the value, the higher the efficiency, such as const is better than ALL. 2) The key displays the index used, and NULL indicates no index. 3) rows estimates the number of scanned rows, affecting query performance. 4) Extra provides additional information, such as Usingfilesort prompts that it needs to be optimized.

Usingtemporary indicates that the need to create temporary tables in MySQL queries, which are commonly found in ORDERBY using DISTINCT, GROUPBY, or non-indexed columns. You can avoid the occurrence of indexes and rewrite queries and improve query performance. Specifically, when Usingtemporary appears in EXPLAIN output, it means that MySQL needs to create temporary tables to handle queries. This usually occurs when: 1) deduplication or grouping when using DISTINCT or GROUPBY; 2) sort when ORDERBY contains non-index columns; 3) use complex subquery or join operations. Optimization methods include: 1) ORDERBY and GROUPB

MySQL/InnoDB supports four transaction isolation levels: ReadUncommitted, ReadCommitted, RepeatableRead and Serializable. 1.ReadUncommitted allows reading of uncommitted data, which may cause dirty reading. 2. ReadCommitted avoids dirty reading, but non-repeatable reading may occur. 3.RepeatableRead is the default level, avoiding dirty reading and non-repeatable reading, but phantom reading may occur. 4. Serializable avoids all concurrency problems but reduces concurrency. Choosing the appropriate isolation level requires balancing data consistency and performance requirements.

MySQL is suitable for web applications and content management systems and is popular for its open source, high performance and ease of use. 1) Compared with PostgreSQL, MySQL performs better in simple queries and high concurrent read operations. 2) Compared with Oracle, MySQL is more popular among small and medium-sized enterprises because of its open source and low cost. 3) Compared with Microsoft SQL Server, MySQL is more suitable for cross-platform applications. 4) Unlike MongoDB, MySQL is more suitable for structured data and transaction processing.

MySQL index cardinality has a significant impact on query performance: 1. High cardinality index can more effectively narrow the data range and improve query efficiency; 2. Low cardinality index may lead to full table scanning and reduce query performance; 3. In joint index, high cardinality sequences should be placed in front to optimize query.

The MySQL learning path includes basic knowledge, core concepts, usage examples, and optimization techniques. 1) Understand basic concepts such as tables, rows, columns, and SQL queries. 2) Learn the definition, working principles and advantages of MySQL. 3) Master basic CRUD operations and advanced usage, such as indexes and stored procedures. 4) Familiar with common error debugging and performance optimization suggestions, such as rational use of indexes and optimization queries. Through these steps, you will have a full grasp of the use and optimization of MySQL.

MySQL's real-world applications include basic database design and complex query optimization. 1) Basic usage: used to store and manage user data, such as inserting, querying, updating and deleting user information. 2) Advanced usage: Handle complex business logic, such as order and inventory management of e-commerce platforms. 3) Performance optimization: Improve performance by rationally using indexes, partition tables and query caches.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Chinese version
Chinese version, very easy to use

SublimeText3 Mac version
God-level code editing software (SublimeText3)

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Dreamweaver Mac version
Visual web development tools

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool