Home  >  Article  >  Database  >  Learn how MySQL optimizes query speed

Learn how MySQL optimizes query speed

coldplay.xixi
coldplay.xixiforward
2020-08-17 16:20:352455browse

Learn how MySQL optimizes query speed

In the previous chapter, we introduced how to choose optimized data types and how to use indexes efficiently, which are essential for high-performance MySQL. But these are not enough, and reasonable design of queries is also required. If the query is poorly written, no matter how reasonable the table structure is and how appropriate the index is, high performance cannot be achieved.

When it comes to MySQL performance optimization, query optimization is the source of optimization, and it can also best reflect whether a system is faster. This chapter and the following chapters will focus on query performance optimization, and will introduce some query optimization techniques to help you have a deeper understanding of how MySQL actually executes queries, where it is slow, and how to make it faster. , and understand the reasons for high efficiency and inefficiency, which will help you better optimize query SQL statements.

Related learning recommendations: mysql video tutorial

This chapter starts from "Why is the query speed so slow" so that you can clearly know This will help you to better optimize the query and know where the query may be slow, so that you can be one step ahead.

1. Where is the slowness?

The real measure of query speed is the response time. If you think of a query as a task, it consists of a series of subtasks, each of which takes a certain amount of time. If you want to optimize a query, you actually want to optimize its subtasks, so eliminate some of those subtasks, then reduce the number of times the subtasks are executed, or make the subtasks run faster.

When MySQL executes a query, what subtasks are there, and which subtasks take the most time? This requires the use of some tools or methods (such as execution plans) to analyze the query to locate and discover where the slowness lies.

Generally speaking, the life cycle of a query can be roughly viewed in order: From the client to the server, then parsed on the server, generate an execution plan, execute, and return the results to the client . Among them, "execution" can be considered the most important stage in the entire life cycle, which includes a large number of calls to the storage engine to retrieve data and data processing after the calls, including sorting, grouping, etc.

When completing these tasks, the query needs to spend time in different places at different stages, including network, CPU calculations, generating statistical information and execution plans, lock waiting and other operations, especially retrieving data from the underlying storage engine These calls require memory operations, CPU operations, and may also generate a large number of context switches and system calls.

In the above operations, a lot of time will be consumed, and there will be some unnecessary additional operations. Some of them may be repeated many times, some operations will be executed very slowly, etc. . This is where the query may really be slow. The purpose of optimizing the query is to reduce and eliminate the time spent on these operations .

Through the above analysis, we have an overall understanding of the query process and can clearly know where the query may have problems, which will eventually cause the entire query to be slow, providing direction for actual query optimization.

In other words, query optimization can be approached from the following two perspectives:

  • Reduce the number of subqueries
  • Reduce additional and repeated operations

A common reason for low query performance is too much data being accessed. When the amount of data is small, the query speed is pretty good. Once the amount of data increases, the query speed will change dramatically, making people crazy and providing a very poor experience. For query optimization, you can check from the following aspects:

  • Whether unnecessary data is queried
  • Whether additional records are scanned

2. Whether unnecessary data is queried

In actual queries, many times the actual required data will be queried, and then these redundant data will be used by the application throw away. This is additional overhead for MySQL and also consumes the application server's CPU and memory resources.
Some typical cases are as follows:

1. Querying unnecessary records

This is a common mistake, and people often mistakenly think that MySQL will only return what is needed. Data, in fact, MySQL returns the entire result set first and then performs calculations.

Developers habitually use the SELECT statement to query a large number of results, and then use the application query or front-end display layer to obtain the previous N rows of data. For example, querying 100 records in a news website, but only in The first 10 items are displayed on the page.

The most effective solution is to query as many records as you need. LIMIT is usually added after the query, that is: paging query.

2. Return all columns when correlating multiple tables

If you want to query all the actors who appeared in the movie Academy Dinosaur, do not do it in the following way Query:

select * fromt actor a
inner join film_actor fa.actorId = a.actorId
inner join film f f.filmId = fa.filmId
where fa.title = 'Academy Dinosaur';

This will return all the data columns of the three tables. The actual requirement is to query actor information. The correct way to write it should be:

select a.* fromt actor a
inner join film_actor fa.actorId = a.actorId
inner join film f f.filmId = fa.filmId
where fa.title = 'Academy Dinosaur';

3. 总是查询出全部列

每次看到select *的时候一定要用异样的目光来审视它,是不是真的需要返回全部数据列?

在大部分情况下,是不需要的。 select *会导致进行全表扫描,会让优化器无法完成索引扫描这类优化,过多的列还会为服务器带来额外的I/O、内存和CPU的消耗。 即使真的需要查询出全部列,应该逐个罗列出全部列而不是*。

4. 重复查询相同的数据

如果你不太留意,很容易出现这样的错误: 不断地重复执行相同的查询,然后每次都返回完全相同的数据。

例如,在用户评论的地方需要查询用户头像的URL,那么用户多次评论的时候,可能就会反复来查询这个数据。 比较好处理方法是,在初次查询的时候将这个数据缓存起来,后续使用时直接从缓存中取出。

三、是否扫描了额外的记录

确 定查询只查询了需要的数据以后,接下来应该看看查询过程中是否扫描了过多的数据。 对于MySQL,最简单衡量查询开销的三个指标如下:

  • 响应时间
  • 扫描的行数
  • 返回的行数

没有哪个指标能够完全来衡量查询的开销,但它们能够大致反映MySQL内部执行查询时需要访问多少数据,并可以大概推算出查询运行的实际。 这三个指标都会记录到MySQL的慢日志中,所以 检查慢日志记录是找出扫描行数过多查询的办法 。

慢查询: 用于记录在MySQL中响应时间超过阈值(long_query_time,默认10s)的语句,并会将慢查询记录到慢日志中。 可通过变量slow_query_long来开启慢查询,默认是关闭状态,可以将慢日志记录到表slow_log或文件中,以供检查分析。

1. 响应时间

响应时间是两个部分之和: 服务时间和排队时间。 服务时间是指数据库处理这个查询真正花费了多长时间。 排队时间是指服务器因为等待某些资源而没有真正执行查询的时间,可能是等待I/O操作,也可能是等待 行 锁等等。

在不同类型的应用压力下,响应时间并没有什么一致的规律或者公式。 诸如存储引擎的锁(表锁,行锁),高并发资源竞争,硬件响应等诸多因素都会影响响应时间,所以,响应时间既可能是一个问题的结果也可能是一个问题的原因,不同案例情况不同。

当你看到一个查询的响应时间的时候,首先需要问问自己,这个响应时间是否是一个合理的值。

2. 扫描的行数和返回的行数

在分析查询时,查看该查询扫描的行数是非常有帮助的,在此之上也能够分析是否扫描了额外的记录。

对于找出那些糟糕查询,这个指标可能还不够完美,因为并不是所有行的访问代价都是相同的。 较短的行的访问速度相当快,内存中的行也比磁盘中的行的访问速度要快的多。

理想的情况下,扫描的行数和返回的行数应该是相同的。 但实际上这种美事并不多,例如在做一个关联查询的时候,扫描的行数和对返回的行数的比率通常都很小,一般在1:1和10:1之间,不过有时候这个值也可能非常大。

3. 扫描的行数和访问类型

在评估查询开销的时候,需要考虑一下从表中找到某一行数据的成本。 MySQL有好几种访问方式可以查找并返回一行结果。 这些访问方式可能需要访问很多行才能返回一条结果,也有些访问方式可能无需扫描就能返回结果。

在执行计划EXPLAIN语句中的type列反映了访问类型。 访问类型有很多种,从全表扫描到索引扫描,范围扫描,唯一索引,常数索引等。 这里列的这些,速度是从慢到快,扫描的行数也是从多到少。

如果查询没有办法找到合适的访问类型,那么解决的最好办法通常就是增加一个合适的索引,这也是我们之前讨论索引的问题。 现在应该明白为什么索引对于查询优化如此重要了。 索引让MySQL以最高效,扫描行数最少的方式找到需要的记录 。

如果发现查询扫描了大量的数据但只返回少数的行,通常可以尝试下面的技巧去优化它:

  • Use index coverage scanning to put all required columns in the index, so that the storage engine can return results without returning to the table to obtain the corresponding rows.
  • Optimize table structure. For example, use a separate summary table to complete the query.
  • Rewrite complex queries so that the MySQL optimizer can execute the query in a more optimized way.

Related recommendations: Programming video course

The above is the detailed content of Learn how MySQL optimizes query speed. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jb51.net. If there is any infringement, please contact admin@php.cn delete