Introducing the MySQL large table optimization solution-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Introducing the MySQL large table optimization solution

coldplay.xixi

Jan 28, 2021 am 09:28 AM

mysql

Free learning recommendation: mysql database(Video )

Background

Alibaba Cloud RDS FOR MySQL (MySQL version 5.7) database business table adds more than 10 million new data every month. As the amount of data continues to increase, slow queries on large tables appear in our business. During peak business periods, slow queries on the main business table take dozens of seconds, seriously affecting the business

Program Overview

Introducing the MySQL large table optimization solution

1. Database design and index optimization

The MySQL database itself is highly flexible, resulting in insufficient performance and heavy reliance on the developer's table design capabilities and indexing Optimization capabilities, here are some optimization suggestions

Convert the time type to timestamp format, store it in int type, build an index to increase query efficiency
It is recommended that the field definition is not null, null Values are difficult to query and optimize and occupy additional index space
Use TINYINT type instead of enumeration ENUM
To store precise floating point numbers, DECIMAL must be used instead of FLOAT and DOUBLE
The field length is serious According to business needs, do not set it too large
Try not to use the TEXT type. If you must use it, it is recommended to split the infrequently used large fields into other tables
MySQL has restrictions on the length of index fields. Yes, the length of each index column of the innodb engine is limited to 767 bytes by default, and the sum of the lengths of all index columns cannot be greater than 3072 bytes (mysql8.0 single index can create 1024 characters)
大If the table has DDL requirements, please contact the DBA

Leftmost index matching rule

As the name suggests, it means leftmost priority. When creating a combined index, it should be used in the where clause according to business needs. The most frequent column is placed on the far left. A very important issue in a compound index is how to arrange the order of columns. For example, if the two fields c1 and c2 are used after where, then the order of the index is (c1, c2) or (c2, c1). The correct approach is to repeat The smaller the value, the higher it is placed. For example, if 95% of the values in a column are not repeated, then this column can generally be placed at the front.

Compound index index(a,b,c)
where a=3 only uses a
where a=3 and b=5 uses a,b
where a=3 and b=5 and c=4 uses a, b,c
where b=3 or where c=4 No index is used
where a=3 and c=4 Only a
where a=3 and b> 10 and c=7 uses a,b
where a=3 and b like 'xx%' and c=7 uses a,b
which is actually equivalent to creating multiple indexes: key (a), key(a,b), key(a,b,c)

2. Switch the database to PloarDB read-write separation

PolarDB It is a next-generation relational cloud database self-developed by Alibaba Cloud. It is 100% compatible with MySQL. The storage capacity can reach up to 100 TB. A single database can be expanded to up to 16 nodes. It is suitable for diversified database application scenarios of enterprises. PolarDB adopts an architecture that separates storage and computing. All computing nodes share a copy of data and provides minute-level configuration upgrades and downgrades, second-level fault recovery, global data consistency, and free data backup and disaster recovery services.

Cluster architecture, separation of computing and storage
PolarDB adopts a multi-node cluster architecture. There is a Writer node (master node) and multiple Reader nodes (read-only nodes) in the cluster. Each node Sharing the underlying storage (PolarStore) through the distributed file system (PolarFileSystem)
Read-write separation
When the application uses the cluster address, PolarDB provides external services through the internal proxy layer (Proxy), and the application All requests go through the proxy first and then access the database node. The proxy layer can not only perform security authentication and protection, but also parse SQL, send write operations (such as transactions, UPDATE, INSERT, DELETE, DDL, etc.) to the master node, and evenly distribute read operations (such as SELECT) to multiple nodes. Read nodes realize automatic read and write separation. For applications, it's as simple as using a single point of database.

In offline mixed scenarios: different services use different connection addresses and use different data nodes to avoid mutual influence

Introducing the MySQL large table optimization solution

Sysbench performance stress test Report:

PloarDB 4-core 16G 2 units

Introducing the MySQL large table optimization solution

##3. Migrate historical data of sub-tables to MySQL8.0 X-Engine storage engine

The split business table retains 3 months of data (this is based on the company's needs). Historical data is split into historical database X-Engine storage engine tables on a monthly basis. Why should we choose X-Engine storage engine tables? What are its advantages? ?

Saving costs, the storage cost of X-Engine is about half that of InnoDB

X-Engine tiered storage improves QPS, adopts a hierarchical storage structure, and combines hot data with Cold data is stored in different levels, and the level where the cold data is located is compressed by default.

X-Engine is an online transaction processing (OLTP) self-developed by Alibaba Cloud Database Product Division. Processing) database storage engine.
The X-Engine storage engine is not only seamlessly compatible with MySQL (thanks to the MySQL Pluginable Storage Engine feature), but X-Engine also uses a layered storage architecture. Because the goal is to store large-scale massive data, provide high concurrent transaction processing capabilities and reduce storage costs, in most large data volume scenarios, the opportunities for data to be accessed are uneven, and hot data that is frequently accessed actually accounts for Very rarely, X-Engine divides the data into multiple levels according to the frequency of data access. According to the access characteristics of each level of data, it designs the corresponding storage structure and writes it to the appropriate storage device

X-Engine uses LSM-Tree as the architectural basis for hierarchical storage and has been redesigned:

The hot data layer and data updates use memory storage, through in-memory database technology (Lock-Free index structure/ append only) improves transaction processing performance.

The pipeline transaction processing mechanism parallels several stages of transaction processing, greatly improving throughput.

Data with low access frequency is gradually eliminated or merged into the persistent storage layer, and combined with multi-layer storage devices (NVM/SSD/HDD) for storage.

A lot of optimizations have been made to the Compaction process that has a large impact on performance:

Split the data storage granularity, use the characteristics of relatively concentrated data update hotspots, and reuse data as much as possible in the merge process .

Finely control the shape of LSM, reduce I/O and computational costs, and effectively alleviate the space increase during the merger process.

Also use more fine-grained access control and caching mechanisms to optimize read performance.

4. Parallel query of Alibaba Cloud PloarDB MySQL8.0 version

After splitting the tables, our data volume is still very large Large, it does not completely solve our slow query problem, but only reduces the size of our business tables. For these slow queries, we need to use PolarDB’s parallel query optimization

PolarDB MySQL 8.0 launches the parallel query framework , when the amount of your query data reaches a certain threshold, the parallel query framework will be automatically started, thereby exponentially reducing the query time.
Split the data into different threads at the storage layer, and multiple threads will perform parallel calculations. The results of the pipeline are summarized into the main thread, and finally the main thread does a simple merge and returns it to the user to improve query efficiency.
Parallel Query utilizes the parallel processing capabilities of multi-core CPUs. Taking the 8-core 32 GB configuration as an example, the schematic diagram is as follows.

Parallel queries are suitable for most SELECT statements, such as large table queries, multi-table join queries, and queries with large calculation loads. For very short queries, the effect is less noticeable.

Parallel query usage, you can use Hint syntax to control a single statement. For example, when the system turns off parallel queries by default, but you need to speed up a high-frequency slow SQL query, you can use Hint to Specific SQL is accelerated.

SELECT / PARALLEL(x)/ … FROM …; – x >0

SELECT /* SET_VAR(max_parallel_degree=n) */ * FROM … // n > 0

Query test: The database is configured with 16 cores and 32G. The data volume of a single table exceeds 30 million

It was 4326ms before parallel query was added, and it was 525ms after adding it, and the performance was improved by 8.24 times.

##5. Interactive analysis Hologre
Although we use parallelism for slow queries on large tables Query optimization has improved efficiency, but we still cannot achieve some specific requirements for real-time reports and real-time large screens, and can only rely on big data for processing.
Here we recommend Alibaba Cloud’s interactive analysis Hologre (
https://help.aliyun.com/product/113622.html)

6. Postscript
Optimization of tens of millions of large tables is based on business scenarios and at the cost of cost. It is not possible to horizontally split and expand the database right from the start, which will bring problems to operation and maintenance and business. A huge challenge. In many cases, the results may not be good. Whether our database design, index optimization, and table partitioning strategies are in place, we should choose the appropriate technology to implement them based on business needs.

More related free learning recommendations: mysql tutorial(Video)

The above is the detailed content of Introducing the MySQL large table optimization solution. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:CSDN. If there is any infringement, please contact admin@php.cn delete

Explain the InnoDB Buffer Pool and its importance for performance.Apr 19, 2025 am 12:24 AM

InnoDBBufferPool reduces disk I/O by caching data and indexing pages, improving database performance. Its working principle includes: 1. Data reading: Read data from BufferPool; 2. Data writing: After modifying the data, write to BufferPool and refresh it to disk regularly; 3. Cache management: Use the LRU algorithm to manage cache pages; 4. Reading mechanism: Load adjacent data pages in advance. By sizing the BufferPool and using multiple instances, database performance can be optimized.

MySQL vs. Other Programming Languages: A ComparisonApr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Learning MySQL: A Step-by-Step Guide for New UsersApr 19, 2025 am 12:19 AM

MySQL is worth learning because it is a powerful open source database management system suitable for data storage, management and analysis. 1) MySQL is a relational database that uses SQL to operate data and is suitable for structured data management. 2) The SQL language is the key to interacting with MySQL and supports CRUD operations. 3) The working principle of MySQL includes client/server architecture, storage engine and query optimizer. 4) Basic usage includes creating databases and tables, and advanced usage involves joining tables using JOIN. 5) Common errors include syntax errors and permission issues, and debugging skills include checking syntax and using EXPLAIN commands. 6) Performance optimization involves the use of indexes, optimization of SQL statements and regular maintenance of databases.

MySQL: Essential Skills for Beginners to MasterApr 18, 2025 am 12:24 AM

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL: Structured Data and Relational DatabasesApr 18, 2025 am 12:22 AM

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL: Key Features and Capabilities ExplainedApr 18, 2025 am 12:17 AM

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

The Purpose of SQL: Interacting with MySQL DatabasesApr 18, 2025 am 12:12 AM

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

MySQL for Beginners: Getting Started with Database ManagementApr 18, 2025 am 12:10 AM

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA

See all articles