Home >headlines >How is MySQL optimized? Let's talk about performance optimization from 5 dimensions

How is MySQL optimized? Let's talk about performance optimization from 5 dimensions

青灯夜游
青灯夜游forward
2022-06-14 11:10:353797browse

If the interviewer asks you: From what dimensions will you optimize MySQL performance? How would you answer?

The so-called performance optimization generally targets the optimization of MySQL queries. Since we are optimizing the query, we naturally need to first know what links the query operation goes through, and then think about which links can be optimized.

I use a picture to show the basic steps that the query operation needs to go through.

How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

The following introduces some strategies for MySQL optimization from 5 perspectives.

How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

1. Connection configuration optimization

Processing connections is the first step in the relationship between the MySQL client and the MySQL server. The first If you can't even walk well, let's not talk about the subsequent story.

Since the connection is a matter for both parties, we naturally optimize it from both the server side and the client side.

1.1 Server configuration

What the server needs to do is to accept as many client connections as possible. Maybe you have encountered error 1040: Too many Error in connections? It's because the server's mind is not broad enough, and the layout is too small!

How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

We can solve the problem of insufficient connections from two aspects:

1. Increase the number of available connections and modify the environment variable max_connections, By default, the maximum number of connections on the server is 151

mysql> show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 151   |
+-----------------+-------+
1 row in set (0.01 sec)

2. Release inactive connections in a timely manner. The system default client timeout is 28800 seconds (8 hours). We can adjust this value a little smaller

mysql> show variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.01 sec)

MySQL has a lot of configuration parameters, and most parameters provide default values. The default values ​​are carefully designed by the MySQL author and can fully satisfy everyone. Due to the needs of some situations, it is not recommended to modify it rashly without knowing the meaning of the parameters.

1.2 Client Optimization

What the client can do is to minimize the number of times it establishes a connection with the server. The established connections can be used as much as possible. Use it. Don't create a new connection every time you execute a SQL statement. The resources of both the server and the client will be overwhelmed.

The solution is to use Connection pool to reuse connections.

Common database connection pools include DBCP, C3P0, Alibaba’s Druid, Hikari, the first two are used There are very few, and the latter two are currently at their peak.

But it should be noted that the bigger the connection pool, the better. For example, the default maximum connection pool size of Druid is 8, and the default maximum connection pool size of Hikari is 10 , blindly increasing the size of the connection pool may reduce system execution efficiency. Why?

For each connection, the server will create a separate thread to process it. The more connections there are, the more threads the server will create. When the number of threads exceeds the number of CPUs, the CPU must allocate time slices to perform context switching of threads. Frequent context switching will cause a lot of performance overhead.

Hikari official gives a recommended value formula for PostgreSQL database connection pool size, Number of CPU cores*2 1. Assuming that the number of CPU cores of the server is 4, just set the connection pool to 9. This formula is also applicable to other databases to a certain extent, and you can brag about it during interviews.

2. Architecture optimization

2.1 Using cache

It is inevitable that some slow queries will appear in the system. These queries Either the amount of data is large, or the query is complex (many associated tables or complex calculations), causing the query to occupy the connection for a long time.

If the effectiveness of this kind of data is not particularly strong (it does not change every moment, such as daily reports), we can put this kind of data into the cache system. During the cache validity period of the data, Get data directly from the cache system, which can reduce the pressure on the database and improve query efficiency.

How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

2.2 Read-write separation (cluster, master-slave replication)

In the early stages of the project, the database usually runs on a On the server, all read and write requests from users will directly affect this database server. After all, the amount of concurrency that a single server can bear is limited.

To address this problem, we can use multiple database servers at the same time, set one of them as the team leader, called the master node, and the remaining nodes as team members, called slave. Users write data only to the master node, and read requests are distributed to various slave nodes. This solution is called read-write separation. Give the small group composed of the group leader and group members a name, cluster.

How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

Note: Many developers are dissatisfied with the offensive words master-slave (because they think it will be associated with racial discrimination, black slaves, etc.), so a movement to change the name was launched.

Affected by this, MySQL will gradually stop using terms such as master and slave, and instead use source and replicaReplacement, just understand it when everyone encounters it.

One problem that must be faced when using a cluster is how to maintain data consistency between multiple nodes. After all, the write request is only sent to the master node. Only the data of the master node is the latest data. How to synchronize the write operation to the master node to all nodes? What about the slave node?

Master-slave replication Technology is here! I briefly introduced the binlog log in my previous article, so I moved it directly.

binlog is the core component that implements the MySQL master-slave replication function. masterThe node will record all write operations into the binlog. slaveThe node will have a dedicated I/O thread to read the binlog of the master node and write The operation is synchronized to the current slave node.

How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

This cluster architecture has a very good effect on reducing the pressure on the main database server. However, as business data increases, if the data volume of a certain table If the query performance of a single table increases sharply, the query performance of a single table will drop significantly. This problem cannot be solved even if the separation of reading and writing is done. After all, all nodes store exactly the same data. The query performance of a single table is poor. Naturally, the performance of all nodes is poor. Difference.

At this time, we can disperse the data of a single node to multiple nodes for storage, which is sub-database and sub-table.

2.3 Sub-database and sub-table

The meaning of nodes in sub-database and sub-table is relatively broad. If the database is used as a node, it is a sub-database; if the leaflet is The table serves as a node, that is, a sub-table.

Everyone knows that sub-databases and tables are divided into vertical sub-databases, vertical sub-tables, horizontal sub-databases and horizontal sub-tables, but every time they fail to remember these concepts, I will explain them in detail to help. Everyone understands.

2.3.1 Vertical sub-database

How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

Based on the single database, make several vertical cuts and split according to business logic into different databases, this is vertical sub-database.

How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

2.3.2 Vertical table partitioning

How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

Vertical table partitioning is in a single table Basically, make a vertical cut (or several cuts) to split multiple words in a table into several small tables. This operation needs to be judged based on the specific business. Usually, frequently used fields (hot fields) are divided into Table, fields that are not used frequently or are not used immediately (cold fields) are divided into one table to improve query speed.

How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

Take the picture above as an example: Usually the product details are relatively long, and when viewing the product list, it is often not necessary to display the product details immediately (usually click the details button will be displayed), but will display the more important information of the product (price, etc.). According to this business logic, we made the original product table into vertical sub-tables.

2.3.3 Horizontal table sharding

Save the data of a single table to multiple data tables according to certain rules (called sharding rules in jargon), horizontally Give the data table a knife (or several knifes), and it will be horizontal table.

1How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

1How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

2.3.4 水平分库

水平分库就是对单个数据库水平切一刀,往往伴随着水平分表。

1How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

1How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

2.3.5 总结

水平分,主要是为了解决存储的瓶颈;垂直分,主要是为了减轻并发压力。

2.4 消息队列削峰

通常情况下,用户的请求会直接访问数据库,如果同一时刻在线用户数量非常庞大,极有可能压垮数据库(参考明星出轨或公布恋情时微博的状态)。

这种情况下可以通过使用消息队列降低数据库的压力,不管同时有多少个用户请求,先存入消息队列,然后系统有条不紊地从消息队列中消费请求。

1How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

3. 优化器——SQL分析与优化

处理完连接、优化完缓存等架构的事情,SQL查询语句来到了解析器和优化器的地盘了。在这一步如果出了任何问题,那就只能是SQL语句的问题了。

只要你的语法不出问题,解析器就不会有问题。此外,为了防止你写的SQL运行效率低,优化器会自动做一些优化,但如果实在是太烂,优化器也救不了你了,只能眼睁睁地看着你的SQL查询沦为慢查询

3.1 慢查询

慢查询就是执行地很慢的查询(这句话说得跟废话似的。。。),只有知道MySQL中有哪些慢查询我们才能针对性地进行优化。

因为开启慢查询日志是有性能代价的,因此MySQL默认是关闭慢查询日志功能,使用以下命令查看当前慢查询状态

mysql> show variables like 'slow_query%';
+---------------------+--------------------------------------+
| Variable_name       | Value                                |
+---------------------+--------------------------------------+
| slow_query_log      | OFF                                  |
| slow_query_log_file | /var/lib/mysql/9e74f9251f6c-slow.log |
+---------------------+--------------------------------------+
2 rows in set (0.00 sec)

slow_query_log表示当前慢查询日志是否开启,slow_query_log_file表示慢查询日志的保存位置。

除了上面两个变量,我们还需要确定“慢”的指标是什么,即执行超过多长时间才算是慢查询,默认是10S,如果改成0的话就是记录所有的SQL。

mysql> show variables like '%long_query%';
+-----------------+-----------+
| Variable_name   | Value     |
+-----------------+-----------+
| long_query_time | 10.000000 |
+-----------------+-----------+
1 row in set (0.00 sec)

3.1.1 打开慢日志

有两种打开慢日志的方式

1、修改配置文件my.cnf

此种修改方式系统重启后依然有效

# 是否开启慢查询日志
slow_query_log=ON
# 
long_query_time=2
slow_query_log_file=/var/lib/mysql/slow.log

2、动态修改参数(重启后失效)

mysql> set @@global.slow_query_log=1;
Query OK, 0 rows affected (0.06 sec)

mysql> set @@global.long_query_time=2;
Query OK, 0 rows affected (0.00 sec)

3.1.2 慢日志分析

MySQL不仅为我们保存了慢日志文件,还为我们提供了慢日志查询的工具mysqldumpslow,为了演示这个工具,我们先构造一条慢查询:

mysql> SELECT sleep(5);

然后我们查询用时最多的1条慢查询:

[root@iZ2zejfuakcnnq2pgqyzowZ ~]# mysqldumpslow -s t -t 1 -g 'select' /var/lib/mysql/9e74f9251f6c-slow.log

Reading mysql slow query log from /var/lib/mysql/9e74f9251f6c-slow.log
Count: 1  Time=10.00s (10s)  Lock=0.00s (0s)  Rows=1.0 (1), root[root]@localhost
  SELECT sleep(N)

其中,

  • Count:表示这个SQL执行的次数
  • Time:表示执行的时间,括号中的是累积时间
  • Locks:表示锁定的时间,括号中的是累积时间
  • Rows:表示返回的记录数,括号中的是累积数

更多关于mysqldumpslow的使用方式,可以查阅官方文档,或者执行mysqldumpslow --help寻求帮助。

3.2 查看运行中的线程

我们可以运行show full processlist查看MySQL中运行的所有线程,查看其状态和运行时间,找到不顺眼的,直接kill。

1How is MySQL optimized? Lets talk about performance optimization from 5 dimensions

其中,

  • Id:线程的唯一标志,可以使用Id杀死指定线程
  • User:启动这个线程的用户,普通账户只能查看自己的线程
  • Host:哪个ip和端口发起的连接
  • db:线程操作的数据库
  • Command:线程的命令
  • Time:操作持续时间,单位秒
  • State:线程的状态
  • Info:SQL语句的前100个字符

3.3 查看服务器运行状态

使用SHOW STATUS查看MySQL服务器的运行状态,有sessionglobal两种作用域,一般使用like+通配符进行过滤。

-- 查看select的次数
mysql> SHOW GLOBAL STATUS LIKE 'com_select';
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| Com_select    | 168241 |
+---------------+--------+
1 row in set (0.05 sec)

3.4 查看存储引擎运行信息

SHOW ENGINE用来展示存储引擎的当前运行信息,包括事务持有的表锁、行锁信息;事务的锁等待情况;线程信号量等待;文件IO请求;Buffer pool统计信息等等数据。

例如:

SHOW ENGINE INNODB STATUS;

上面这条语句可以展示innodb存储引擎的当前运行的各种信息,大家可以据此找到MySQL当前的问题,限于篇幅不在此意义说明其中信息的含义,大家只要知道MySQL提供了这样一个监控工具就行了,等到需要的时候再来用就好。

3.5 EXPLAIN执行计划

通过慢查询日志我们可以知道哪些SQL语句执行慢了,可是为什么慢?慢在哪里呢?

MySQL提供了一个执行计划的查询命令EXPLAIN,通过此命令我们可以查看SQL执行的计划,所谓执行计划就是:优化器会不会优化我们自己书写的SQL语句(比如外连接改内连接查询,子查询优化为连接查询...)、优化器针对此条SQL的执行对哪些索引进行了成本估算,并最终决定采用哪个索引(或者最终选择不用索引,而是全表扫描)、优化器对单表执行的策略是什么,等等等等。

EXPLAIN在MySQL5.6.3之后也可以针对UPDATE、DELETE和INSERT语句进行分析,但是通常情况下我们还是用在SELECT查询上。

这篇文章主要是从宏观上多个角度介绍MySQL的优化策略,因此这里不详细说明EXPLAIN的细节,之后单独成篇。

3.6 SQL与索引优化

3.6.1 SQL优化

SQL优化指的是SQL本身语法没有问题,但是有实现相同目的的更好的写法。比如:

  • 使用小表驱动大表;用join改写子查询;or改成union
  • 连接查询中,尽量减少驱动表的扇出(记录数),访问被驱动表的成本要尽量低,尽量在被驱动表的连接列上建立索引,降低访问成本;被驱动表的连接列最好是该表的主键或者是唯一二级索引列,这样被驱动表的成本会降到更低
  • 大偏移量的limit,先过滤再排序

针对最后一条举个简单的例子,下面两条语句能实现同样的目的,但是第二条的执行效率比第一条执行效率要高得多(存储引擎使用的是InnoDB),大家感受一下:

-- 1. 大偏移量的查询
mysql> SELECT * FROM user_innodb LIMIT 9000000,10;
Empty set (8.18 sec)

-- 2.先过滤ID(因为ID使用的是索引),再limit
mysql> SELECT * FROM user_innodb WHERE id > 9000000 LIMIT 10;
Empty set (0.02 sec)

3.6.2 索引优化

为慢查询创建适当的索引是个非常常见并且非常有效的方法,但是索引是否会被高效使用又是另一门学问了。

推荐阅读:《如何用好MySQL索引?你必须了解这些事!》,感兴趣的读者可以看一下。
https://www.php.cn/mysql-tutorials-493147.html

4. 存储引擎与表结构

4.1 选择存储引擎

一般情况下,我们会选择MySQL默认的存储引擎存储引擎InnoDB,但是当对数据库性能要求精益求精的时候,存储引擎的选择也成为一个关键的影响因素。

建议根据不同的业务选择不同的存储引擎,例如:

  • 查询操作、插入操作多的业务表,推荐使用MyISAM
  • 临时表使用Memory
  • 并发数量大、更新多的业务选择使用InnoDB
  • 不知道选啥直接默认。

4.2 优化字段

字段优化的最终原则是:使用可以正确存储数据的最小的数据类型

4.2.1 整数类型

MySQL提供了6种整数类型,分别是

  • tinyint
  • smallint
  • mediumint
  • int
  • integer
  • bigint

不同的存储类型的最大存储范围不同,占用的存储的空间自然也不同。

例如,是否被删除的标识,建议选用tinyint,而不是bigint

4.2.2 字符类型

你是不是直接把所有字符串的字段都设置为varchar格式了?甚至怕不够,还会直接设置成varchar(1024)的长度?

如果不确定字段的长度,肯定是要选择varchar,但是varchar需要额外的空间来记录该字段目前占用的长度;因此如果字段的长度是固定的,尽量选用char,这会给你节约不少的内存空间。

4.2.3 Non-null

Try to set non-null fields to NOT NULL, and provide a default value, or use a special value instead of NULL.

Because NULL type storage and optimization will have poor performance problems, the specific reasons will not be discussed here.

4.2.4 Do not use foreign keys, triggers and view functions

This is also a principle mentioned in the "Alibaba Development Manual". There are three reasons:

  • Reduces readability, and you have to check the database code while checking the code;

  • Hands over the calculation work For the program, the database only does the storage work and does this well;

  • The work of data integrity verification should be completed by the developer rather than relying on external Key, once you use foreign keys, you will find that it becomes extremely difficult to delete some junk data during testing.

4.2.5 Image, audio and video storage

Do not store large files directly, but store the access address of large files.

4.2.6 Large field splitting and data redundancy

Large field splittingIn fact, it is the vertical table partitioning mentioned before. Split infrequently used fields or fields with large amounts of data to avoid too many columns and too large amounts of data, especially if you are used to writing SELECT * . The problem will be seriously magnified!

Field redundancyIn principle, it does not conform to the database design paradigm, but it is very conducive to fast retrieval. For example, when the customer ID is stored in the contract table, the customer name can be stored redundantly, so that there is no need to obtain the user name based on the customer ID when querying. Therefore, it is also a better optimization technique to make a certain degree of redundancy for business logic.

5. Business Optimization

Strictly speaking, business optimization is no longer a means of MySQL tuning, but business optimization can very effectively alleviate the problem. Database access pressure. A typical example in this regard is Taobao. Here are a few simple examples to give you some ideas:

  • In the past, shopping started on the night of Double 11. Recently, In the past few years, the pre-sale front for Double 11 has become longer and longer, starting more than half a month in advance, and various deposit red envelope models have emerged in endlessly. This method is called Pre-sale diversion. This can divert customer service requests, and you don’t have to wait until the early morning of Double Eleven to place orders collectively;

  • In the early morning of Double Eleven, you may want to check orders outside of that day, but But the query failed; even the chicken rations in Alipay were delayed. This is a downgrade strategy, which gathers computing resources for unimportant services to ensure the current core business;

  • During Double Eleven, Alipay strongly recommends using Huabei for payment instead of bank card payment. Although part of the consideration is to improve the stickiness of the software, on the other hand, using Yu'e Bao actually uses Alipay The internal server has fast access speed, but using a bank card requires calling the bank interface, which is much slower in comparison.


The summary of MySQL optimization ends here. There are many details not mentioned, which makes me feel that this article is not perfect. However, there are too many knowledge points to cover in detail. It is impossible to write them all down at once. Let’s write them down later.

[Related recommendations: mysql video tutorial]

Statement:
This article is reproduced at:juejin.cn. If there is any infringement, please contact admin@php.cn delete