MySQL Logical Architecture
MySQL logical architecture is divided into three layers. The top layer is the client layer, which is not unique to MySQL. Functions such as connection processing, authorization authentication, and security are all handled in this layer. .
Most of MySQL’s core services are in the middle layer, including query parsing, analysis, optimization, caching, built-in functions (time, mathematics, encryption, etc.), and all cross-storage engine functions are also in this layer. Layer implementation: stored procedures, triggers, views, etc.
The bottom layer is the storage engine, which is responsible for data storage and retrieval in MySQL. The middle service layer communicates with the storage engine through APIs. These API interfaces shield the differences between different storage engines.
MySQL query process
When sending a request to MySQL:
1. Client/server communication protocol
MySQL client/server communication protocol is "half-duplex": at any time, either The server sends data to the client, or the client sends data to the server. These two actions cannot occur at the same time. Once one end starts sending a message, the other end must receive the entire message before it can respond to it, so we cannot and do not need to cut a message into small pieces and send them independently, and there is no way to control the flow.
The client sends the query request to the server in a separate data packet, so when the query statement is very long, the max_allowed_packet parameter needs to be set. However, it should be noted that if the query is too large, the server will refuse to accept more data and throw an exception.
On the contrary, the data that the server responds to the user is usually a lot of data, consisting of multiple data packets. But when the server responds to the client's request, the client must accept the entire returned result completely, instead of simply taking the first few results and then asking the server to stop sending. Therefore, in actual development, it is a very good habit to keep queries as simple as possible and only return necessary data, and to reduce the size and number of data packets during communication. This is also the reason why we try to avoid using SELECT * and adding LIMIT restrictions in queries. one.
2. Query cache
Before parsing a query statement, if the query cache is turned on, MySQL will check whether the query statement hits the data in the query cache. . If the current query happens to hit the query cache, the results in the cache will be returned directly after checking the user permissions once. In this case, the query will not be parsed, an execution plan will not be generated, and it will not be executed.
MySQL stores the cache in a reference table (a data structure similar to HashMap), indexed by a hash value. This hash value is determined by the query itself, the database currently being queried, the client protocol version number, etc. Some information is calculated that may affect the results. Therefore, any difference in characters between the two queries (spaces, comments) will cause the cache to miss.
If the query contains any user-defined functions, stored functions, user variables, temporary tables, or system tables in the mysql library, the query results will not be cached. For example, the function NOW() or CURRENT_DATE() will return different query results due to different query times. Another example is that a query statement containing CURRENT_USER or CONNECION_ID() will return different results due to different users. Cache such query results. It doesn't make any sense.
3. Cache invalidation
MySQL's query cache system keeps track of each table involved in the query. If these tables (data or structure) change, then and All cached data related to this table will be invalidated. Because of this, MySQL must invalidate all caches for the corresponding table during any write operation. If the query cache is very large or fragmented, this operation may cause a lot of system consumption and even cause the system to freeze for a while. Moreover, the additional consumption of the query cache on the system is not only for write operations, but also for read operations:
1. Any query statement must be checked before starting, even if this SQL statement will never hit Caching
2. If the query results can be cached, then after the execution is completed, the results will be stored in the cache, which will also bring additional system consumption
Based on this, you must know what it is not In this case, query caching will improve system performance, and caching and invalidation will bring additional consumption. Only when the resource savings brought by caching are greater than the resources consumed by itself, will it bring performance improvements to the system. But it is very difficult to evaluate whether turning on cache can bring performance improvements. If the system does have some performance problems, you can try to turn on the query cache and make some optimizations in the database design: For example:
1. Use multiple small tables instead of one large table, and be careful not to over-design
2. Batch insertion instead of circular single insertion
3. Reasonably control the size of the cache space. Generally speaking, it is more appropriate to set the size to tens of megabytes
4. You can use SQL_CACHE and SQL_NO_CACHE To control whether a certain query statement needs to be cached
Do not turn on the query cache easily, especially for write-intensive applications. If you really can't help it, you can set query_cache_type to DEMAND. At this time, only queries that add SQL_CACH will be cached, and other queries will not. This way you can freely control which queries need to be cached.
4. Syntax parsing and preprocessing
MySQL parses SQL statements through keywords and generates a corresponding parse tree. This process parser mainly verifies and parses through grammar rules. For example, whether the wrong keywords are used in SQL or whether the order of keywords is correct, etc. Preprocessing will further check whether the parse tree is legal according to MySQL rules. For example, check whether the data table and data column to be queried exist, etc.
5. Query optimization
After the syntax tree is considered legal, and the optimizer converts it into a query plan, in most cases, a query can have many All execution methods will eventually return corresponding results. The role of the optimizer is to find the best execution plan among them.
MySQL's query optimizer is a very complex component. It uses a lot of optimization strategies to generate an optimal execution plan:
1. Redefine the association order of the table ( When multiple tables are associated with queries, they do not necessarily follow the order specified in SQL, but there are some techniques to specify the association order)
2. Optimize the MIN() and MAX() functions (find the minimum value of a column) value, if the column has an index, you only need to find the leftmost end of the B Tree index, otherwise you can find the maximum value)
3. Terminate the query early (when using Limit, it will be immediately after finding a result set that meets the number of Terminate the query)
4. Optimize sorting (in the old version, MySQL will use two transfer sorting, that is, first read the row pointer and the fields that need to be sorted, sort them in memory, and then read them according to the sorting results. Fetch data rows, and the new version uses single-transfer sorting, that is, reading all data rows at one time and then sorting them according to the given columns)
6. Query execution engine
After completing the parsing and optimization stages, MySQL will generate the corresponding execution plan, and the query execution engine will gradually execute the instructions according to the execution plan to obtain the results. Most operations in the entire execution process are completed by calling interfaces implemented by the storage engine. These interfaces are called handler APIs. Each table in the query process is represented by a handler instance. In fact, MySQL creates a handler instance for each table during the query optimization phase. The optimizer can obtain table-related information based on the interfaces of these instances, including All column names, index statistics, etc. of the table. The storage engine interface provides very rich functions, but there are only dozens of interfaces at the bottom. These interfaces, like tower blocks, complete most of the operations of a query.
7. Return the results to the client
The last stage of query execution is to return the results to the client. Even if no data can be queried, MySQL will still return information related to the query, such as the number of rows affected by the query, execution time, etc.
If the query cache is turned on and the query can be cached, MySQL will also store the results in the cache.
Returning the result set to the client is an incremental and gradual return process. It is possible that MySQL begins to gradually return the result set to the client when it generates the first result. In this way, the server does not need to store too many results and consume too much memory, and the client can also get the returned results as soon as possible. It should be noted that each row in the result set will be sent as a data packet that meets the communication protocol described in ①, and then transmitted through the TCP protocol. During the transmission process, MySQL data packets may be cached and then sent in batches.
MySQL entire query execution process
1. The client sends a query request to the MySQL server
2. The server first checks the query cache. If If the cache is hit, the result stored in the cache will be returned immediately. Otherwise, enter the next level section
3. The server performs SQL parsing and preprocessing, and then the optimizer generates the corresponding execution plan
4.MySQL calls the API of the storage engine based on the execution plan. Execute query
The above is the detailed content of how mysql works. For more information, please follow other related articles on the PHP Chinese website!