Home >Database >SQL >Interview question: How to optimize MySQL in daily work?

Interview question: How to optimize MySQL in daily work?

Java后端技术全栈
Java后端技术全栈forward
2023-08-17 16:26:101137browse

Preface

MySQL’s common optimization methods are divided into the following aspects:

SQL optimization, design optimization, hardware optimization, etc., each major direction contains multiple small optimization points

Interview question: How to optimize MySQL in daily work?

Let’s take a look in detail

SQL optimization

This optimization plan refers to improving the operating efficiency of the MySQL database by optimizing SQL statements and indexes. The specific content is as follows:

Paging optimization

For example:

select * from table where type = 2 and level = 9 order by id asc limit 190289,10;

Optimization plan:

  • Delayed association

    First extract the primary key through the where condition, then associate the table with the original data table, and extract the data rows through the primary key id instead of extracting the data rows through the original secondary index

    For example:

select a.* from table a, (select id from table where type = 2 and level = 9 order by id asc limit 190289,10 ) b where a.id = b.id
  • Bookmark method

    To put it bluntly, the bookmark method is to find the main parameter corresponding to the first parameter of limit. Key value, then filter and limit based on this primary key value

    For example:

select * from table where id > (select * from table where type = 2 and level = 9 order by id asc limit 190289, 1) limit 10;

Index optimization

Use index correctly

假如我们没有添加索引,那么在查询时就会触发全表扫描,因此查询的数据就会很多,并且查询效率会很低,为了提高查询的性能,我们就需要给最常使用的查询字段上,添加相应的索引,这样才能提高查询的性能

建立覆盖索引

InnoDB使用辅助索引查询数据时会回表,但是如果索引的叶节点中已经包含要查询的字段,那它没有必要再回表查询了,这就叫覆盖索引

例如对于如下查询:

select name from test where city='上海'

我们将被查询的字段建立到联合索引中,这样查询结果就可以直接从索引中获取

alter table test add index idx_city_name (city, name);

在 MySQL 5.0 之前的版本尽量避免使用or查询

在 MySQL 5.0 之前的版本要尽量避免使用 or 查询,可以使用 union 或者子查询来替代,因为早期的 MySQL 版本使用 or 查询可能会导致索引失效,在 MySQL 5.0 之后的版本中引入了索引合并

索引合并简单来说就是把多条件查询,比如or或and查询对多个索引分别进行条件扫描,然后将它们各自的结果进行合并,因此就不会导致索引失效的问题了

如果从Explain执行计划的type列的值是index_merge可以看出MySQL使用索引合并的方式来执行对表的查询

避免在 where 查询条件中使用 != 或者 a8093152e673feb7aba1828c43532094 操作符

SQL中,不等于操作符会导致查询引擎放弃索引索引,引起全表扫描,即使比较的字段上有索引

解决方法:通过把不等于操作符改成or,可以使用索引,避免全表扫描

例如,把columna8093152e673feb7aba1828c43532094’aaa’,改成column>’aaa’ or column<’aaa’,就可以使用索引了

适当使用前缀索引

MySQL 是支持前缀索引的,也就是说我们可以定义字符串的一部分来作为索引

我们知道索引越长占用的磁盘空间就越大,那么在相同数据页中能放下的索引值也就越少,这就意味着搜索索引需要的查询时间也就越长,进而查询的效率就会降低,所以我们可以适当的选择使用前缀索引,以减少空间的占用和提高查询效率

比如,邮箱的后缀都是固定的“@xxx.com”,那么类似这种后面几位为固定值的字段就非常适合定义为前缀索引

alter table test add index index2(email(6));

使用前缀索引,定义好长度,就可以做到既节省空间,又不用额外增加太多的查询成本

需要注意的是,前缀索引也存在缺点,MySQL无法利用前缀索引做order by和group by 操作,也无法作为覆盖索引

查询具体的字段而非全部字段

要尽量避免使用select *,而是查询需要的字段,这样可以提升速度,以及减少网络传输的带宽压力

优化子查询

尽量使用 Join 语句来替代子查询,因为子查询是嵌套查询,而嵌套查询会新创建一张临时表,而临时表的创建与销毁会占用一定的系统资源以及花费一定的时间,同时对于返回结果集比较大的子查询,其对查询性能的影响更大

小表驱动大表

我们要尽量使用小表驱动大表的方式进行查询,也就是如果 B 表的数据小于 A 表的数据,那执行的顺序就是先查 B 表再查 A 表,具体查询语句如下:

select name from A where id in (select id from B);

不要在列上进行运算操作

不要在列字段上进行算术运算或其他表达式运算,否则可能会导致查询引擎无法正确使用索引,从而影响了查询的效率

select * from test where id + 1 = 50;
select * from test where month(updateTime) = 7;

一个很容易踩的坑:隐式类型转换:

select * from test where skuId=123456

skuId这个字段上有索引,但是explain的结果却显示这条语句会全表扫描

原因在于skuId的字符类型是varchar(32),比较值却是整型,故需要做类型转换

适当增加冗余字段

增加冗余字段可以减少大量的连表查询,因为多张表的连表查询性能很低,所有可以适当的增加冗余字段,以减少多张表的关联查询,这是以空间换时间的优化策略

正确使用联合索引

使用了 B+ 树的 MySQL 数据库引擎,比如 InnoDB 引擎,在每次查询复合字段时是从左往右匹配数据的,因此在创建联合索引的时候需要注意索引创建的顺序

例如,我们创建了一个联合索引是idx(name,age,sex),那么当我们使用,姓名+年龄+性别、姓名+年龄、姓名等这种最左前缀查询条件时,就会触发联合索引进行查询;然而如果非最左匹配的查询条件,例如,性别+姓名这种查询条件就不会触发联合索引

Join优化

MySQL的join语句连接表使用的是nested-loop join算法,这个过程类似于嵌套循环,简单来说,就是遍历驱动表(外层表),每读出一行数据,取出连接字段到被驱动表(内层表)里查找满足条件的行,组成结果行

要提升join语句的性能,就要尽可能减少嵌套循环的循环次数

一个显著优化方式是对被驱动表的join字段建立索引,利用索引能快速匹配到对应的行,避免与内层表每一行记录做比较,极大地减少总循环次数。另一个优化点,就是连接时用小结果集驱动大结果集,在索引优化的基础上能进一步减少嵌套循环的次数

如果难以判断哪个是大表,哪个是小表,可以用inner join连接,MySQL会自动选择小表去驱动大表

避免使用JOIN关联太多的表

对于 MySQL 来说,是存在关联缓存的,缓存的大小可以由join_buffer_size参数进行设置

在 MySQL 中,对于同一个 SQL 多关联(join)一个表,就会多分配一个关联缓存,如果在一个 SQL 中关联的表越多,所占用的内存也就越大

如果程序中大量的使用了多表关联的操作,同时join_buffer_size设置的也不合理的情况下,就容易造成服务器内存溢出的情况,就会影响到服务器数据库性能的稳定性

排序优化

利用索引扫描做排序

MySQL有两种方式生成有序结果:其一是对结果集进行排序的操作,其二是按照索引顺序扫描得出的结果自然是有序的

但是如果索引不能覆盖查询所需列,就不得不每扫描一条记录回表查询一次,这个读操作是随机IO,通常会比顺序全表扫描还慢

因此,在设计索引时,尽可能使用同一个索引既满足排序又用于查找行

例如:

--建立索引(date,staff_id,customer_id)
select staff_id, customer_id from test where date = &#39;2010-01-01&#39; order by staff_id,customer_id;

只有当索引的列顺序和ORDER BY子句的顺序完全一致,并且所有列的排序方向都一样时,才能够使用索引来对结果做排序

UNION优化

MySQL处理union的策略是先创建临时表,然后将各个查询结果填充到临时表中最后再来做查询,很多优化策略在union查询中都会失效,因为它无法利用索引

最好手工将where、limit等子句下推到union的各个子查询中,以便优化器可以充分利用这些条件进行优化

此外,除非确实需要服务器去重,一定要使用union all,如果不加all关键字,MySQL会给临时表加上distinct选项,这会导致对整个临时表做唯一性检查,代价很高

慢查询日志

出现慢查询通常的排查手段是先使用慢查询日志功能,查询出比较慢的 SQL 语句,然后再通过 Explain 来查询 SQL 语句的执行计划,最后分析并定位出问题的根源,再进行处理

Slow query log refers to the recording function of slow query log that can be turned on through configuration in MySQL. SQL exceeding the long_query_time value will be recorded in the log

We can Turn on slow query by setting "slow_query_log=1"

It should be noted that after turning on the slow log function, it will have a certain impact on the performance of MySQL, so in a production environment Use this function with caution

Design optimization

Try to avoid using NULL

NULL in MySQL It is difficult to handle, storage requires extra space, and operations also require special operators. Columns containing NULL are difficult to query Instead of a null value, for example, defined as int not null default 0

Minimum data length

The smaller the data type length is usually required in disk, memory and CPU cache Less space, faster processing

Use the simplest data type

Simple data type operations are cheaper, for example: don’t use the int type if you can varchar type, because the query efficiency of int type is higher than that of varchar type

Define the text type as little as possible

The query efficiency of text type is very low, if you must use text Define a field, which can be separated into sub-tables. When you need to query this field, use a joint query, which can improve the query efficiency of the main table.

Appropriate table and database strategy

Table splitting means that when there are more fields in a table, you can try to split a large table into multiple sub-tables, put the more frequently used main information into the main table, and put the others into sub-tables, so Most of our queries only need to query the main table with fewer fields, thus effectively improving the efficiency of the query.

Sub-database refers to dividing a database into multiple databases. For example, we split a database into multiple databases. One main database is used to write and modify data, and the other ones are used to synchronize the main data and provide it to the client for query. In this way, the reading and writing pressure of one database is shared. Multiple libraries are provided, thereby improving the overall operating efficiency of the database

Common type selection

Integer type width setting

MySQL can specify the width for integer types, such as int (11) is actually meaningless. It does not limit the range of values. For storage and calculation, int(1) and int(20) are the same

VARCHAR and CHAR Type

char type is fixed-length, while varchar stores variable strings, which saves more space than fixed-length. However, varchar requires an extra 1 or 2 bytes to record the length of the string, and it also needs to be updated when updating. It is prone to fragmentation

The choice needs to be combined with the usage scenario: if the maximum length of the string column is much larger than the average length, or the column is rarely updated, varchar is more suitable; if you want to store a very short string, or String values ​​are of the same length, such as MD5 values, or column data changes frequently. Choose to use the char type

DATETIME and TIMESTAMP types

The datetime range is wider and can It represents the years from 1001 to 9999, and timestamp can only represent the years from 1970 to 2038. datetime has nothing to do with time zone, timestamp display value depends on time zone. In most scenarios, both types can work well, but it is recommended to use timestamp, because datetime occupies 8 bytes and timestamp only occupies 4 bytes, timestamp space is more efficient

BLOB and TEXT types

blob and text are string data types designed to store large amounts of data, and are stored in binary and character modes respectively

In actual use , be careful when using these two types. Their query efficiency is very low. If a field must use these two types, you can separate this field into a sub-table. When you need to query this field, use a joint query, which can improve the query of the main table. Efficiency

Normalization

When the data is better normalized, less data is modified, and the normalized table is usually smaller, More data can be cached in memory, so the execution of operations will be faster

The disadvantage is that more associations are required during query

First normal form: fields are indivisible, and the database supports it by default

Second Normal Form: Eliminate partial dependence on the primary key. You can add a field that has nothing to do with business logic as the primary key in the table, such as using an auto-increment id

Third Normal Form: Eliminate the dependence on the primary key The transitive dependency of the primary key can split the table and reduce data redundancy

Hardware optimization

MySQL’s hardware requirements are mainly reflected in three aspects: disk, network and memory

Disk

Disks should try to use disks with high-performance reading and writing capabilities, such as solid-state drives, so that I/O running time can be reduced, thereby improving the overall operating efficiency of MySQL

Disk You can also try to use multiple small disks instead of one large disk, because the rotation speed of the disk is fixed. Having multiple small disks is equivalent to having multiple disks running in parallel

Network

Ensuring smooth network bandwidth (low latency) and sufficient network bandwidth are the basic conditions for the normal operation of MySQL. If conditions permit, multiple network cards can also be set up to improve the performance of the MySQL server during peak network periods. The operating efficiency

Memory

The larger the memory of the MySQL server, the more information is stored and cached, and the performance of the memory is very high, so Improved the operating efficiency of the entire MySQL

The above is the detailed content of Interview question: How to optimize MySQL in daily work?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:Java后端技术全栈. If there is any infringement, please contact admin@php.cn delete
Previous article:sql in operator usageNext article:sql in operator usage