Home >Database >Mysql Tutorial >In-depth understanding of MySQL temporary tables
Overview
There are two main types of temporary tables in MySQL, including external temporary tables and internal temporary tables. The external temporary table is a temporary table created through the statement create temporary table.... The temporary table is only valid in this session. After the session is disconnected, the temporary table data will be automatically cleared. There are two main types of internal temporary tables. One is the temporary table in information_schema, and the other is when the session executes a query. If the execution plan contains "Using temporary", a temporary table will be generated. One difference between internal temporary tables and external temporary tables is that we cannot see the table structure definition file frm of the internal temporary table. The table definition file frm of the external temporary table is generally composed of #sql{process id}_{thread id}_serial number, so different sessions can create temporary tables with the same name.
Temporary table
The main difference between temporary tables and ordinary tables is whether the data is automatically cleaned after the instance, session, or statement ends. For example, in the internal temporary table, if we want to store the intermediate result set during a query, the temporary table will be automatically recycled after the query is completed, without affecting the user table structure and data. In addition, temporary tables in different sessions can have the same name. When multiple sessions execute queries, if you want to use temporary tables, there will be no worries about duplicate names. After the temporary table space was introduced in 5.7, all temporary tables are stored in the temporary table space (non-compressed), and the data in the temporary table space can be reused. Temporary tables not only support Innodb engine, but also support myisam engine, memory engine, etc. Therefore, we cannot see the entity (idb file) in the temporary table, but it is not necessarily a memory table and may also be stored in a temporary table space.
Temporary table VS memory table
The temporary table can be either an innodb engine table or a memory engine table. The so-called memory table here refers to the memory engine table. Through the table creation statement create table...engine=memory, all the data is in the memory. The table structure is managed through frm. For the same internal memory engine table, the frm file cannot be seen. , can't even see the information_schema directory on disk. Within MySQL, the temporary tables in information_schema include two types: temporary tables of the innodb engine and temporary tables of the memory engine. For example, the TABLES table belongs to the memory temporary table, while columns and processlist belong to the innodb engine temporary table. All data in the memory table is in the memory. The data structure in the memory is an array (heap table). All data operations are completed in the memory. For small data volume scenarios, the speed is relatively fast (no physical IO operations are involved). But memory is a limited resource after all. Therefore, if the amount of data is relatively large, it is not suitable to use a memory table. Instead, choose to use a disk temporary table (innodb engine). This temporary table uses a B-tree storage structure (innodb engine). Innodb The bufferpool resource is shared, and the data in the temporary table may have a certain impact on the hot data of the bufferpool. In addition, the operation may involve physical IO. Memory engine tables can actually create indexes, including Btree indexes and Hash indexes, so the query speed is very fast. The main drawback is limited memory resources.
Scenarios of using temporary tables
As mentioned earlier, when the execution plan contains "Using temporary", the temporary table will be used. Here are two main scenarios.
The test table structure is as follows:
mysql> show create table t1_normal\G *************************** 1. row *************************** Table: t1_normal Create Table: CREATE TABLE `t1_normal` ( `id` int(11) NOT NULL AUTO_INCREMENT, `c1` int(11) DEFAULT NULL, `c2` int(11) DEFAULT NULL, `c3` int(11) DEFAULT NULL, `c4` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=770023 DEFAULT CHARSET=utf8
Scenario 1: union
mysql> explain select * from t1_normal union select * from t1_normal; +----+--------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------+ | 1 | PRIMARY | t1_normal | NULL | ALL | NULL | NULL | NULL | NULL | 523848 | 100.00 | NULL | | 2 | UNION | t1_normal | NULL | ALL | NULL | NULL | NULL | NULL | 523848 | 100.00 | NULL | | NULL | UNION RESULT | <union1,2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary | +----+--------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------+
The meaning of the union operation is to take the union of the two subquery results, and only keep one row of duplicate data , by creating a temporary table with a primary key, you can solve the "duplication" problem and store the final result set through the temporary table, so you can see "Using temporary" in the Extra item in the execution plan. An operation related to union is union all, which also merges the results of two subqueries, but does not solve the duplication problem. Therefore, for union all, there is no meaning of "duplication removal", so there is no need for a temporary table.
mysql> explain select * from t1_normal union all select * from t1_normal; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+ | 1 | PRIMARY | t1_normal | NULL | ALL | NULL | NULL | NULL | NULL | 523848 | 100.00 | NULL | | 2 | UNION | t1_normal | NULL | ALL | NULL | NULL | NULL | NULL | 523848 | 100.00 | NULL | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
Scenario 2: group by
mysql> explain select c1,count(*) as count from t1_normal group by c1; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+---------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+---------------------------------+ | 1 | SIMPLE | t1_normal | NULL | ALL | NULL | NULL | NULL | NULL | 523848 | 100.00 | Using temporary; Using filesort | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+---------------------------------+
The meaning of group by is to group by the specified column and be ordered by the specified column by default. The meaning of the above SQL statement is to group the data in t1_normal by the value of column c1 and count the number of records for each column value of c1. From the execution plan, we see "Using temporary;Using filesort". For group by, we first need to count the number of occurrences of each value. This requires the use of a temporary table to quickly locate it. If it does not exist, insert a record. , if it exists, and the count is accumulated, so you see "Using temporary"; and because group by implies sorting, you need to sort the records according to column c1, so you see "Using filesort".
1). Eliminate filesort
In fact, group by can also display the elimination of "sort meaning".
mysql> explain select c1,count(*) as count from t1_normal group by c1 order by null; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-----------------+ | 1 | SIMPLE | t1_normal | NULL | ALL | NULL | NULL | NULL | NULL | 523848 | 100.00 | Using temporary | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-----------------+
You can see that after adding "order by null" to the statement, "Using filesort" no longer appears in the execution plan.
2). Eliminate temporary tables
mysql> explain select SQL_BIG_RESULT c1,count(*) as count from t1_normal group by c1; +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+ | 1 | SIMPLE | t1_normal | NULL | ALL | NULL | NULL | NULL | NULL | 523848 | 100.00 | Using filesort | +----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
可以看到执行计划中已经没有了“Using temporary”,所以group by并非一定依赖临时表,临时表在group by中的作用主要是“去重”。所以,实际上有另外一种方式,不使用临时表,直接利用sort_buffer排序(sort_buffer不够时,进行文件排序,具体而言是每一个有序数组作为一个单独文件,然后进行外排归并),然后再扫描得到聚合后的结果集。
3).SQL_BIG_RESULT
同时我们语句中用到了“SQL_BIG_RESULT”这个hint,正是因为这个hint导致了我们没有使用临时表,先说说SQL_BIG_RESULT和SQL_SMALL_RESULT的含义。
SQL_SMALL_RESULT:显示指定用内存表(memory引擎)
SQL_BIG_RESULT:显示指定用磁盘临时表(myisam引擎或innodb引擎)
两者区别在于,使用磁盘临时表可以借助主键做去重排序,适合大数据量;使用内存表写入更快,然后在内存中排序,适合小数据量。下面是从MySQL手册中摘录的说明。
SQL_BIG_RESULT or SQL_SMALL_RESULT can be used with GROUP BY or DISTINCT to tell the optimizer that the result set has many rows or is small, respectively.
For SQL_BIG_RESULT, MySQL directly uses disk-based temporary tables if needed, and prefers sorting to using a temporary table with a key on the GROUP BY elements.
For SQL_SMALL_RESULT, MySQL uses fast temporary tables to store the resulting table instead of using sorting.
This should not normally be needed.
回到问题本身,这里MySQL优化器根据hint知道需要使用磁盘临时表,而最终直接选择了数组存储+文件排序这种更轻量的方式。
如何避免使用临时表
通常的SQL优化方式是让group by 的列建立索引,那么执行group by时,直接按索引扫描该列,并统计即可,也就不需要temporary和filesort了。
mysql> alter table t1_normal add index idx_c1(c1); Query OK, 0 rows affected (1 min 23.82 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> explain select c1,count(*) as count from t1_normal group by c1 order by null; +----+-------------+-----------+------------+-------+---------------+--------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+-------+---------------+--------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | t1_normal | NULL | index | idx_c1 | idx_c1 | 5 | NULL | 523848 | 100.00 | Using index | +----+-------------+-----------+------------+-------+---------------+--------+---------+------+--------+----------+-------------+
相关参数与状态监控
1).参数说明
max_heap_table_size
This variable sets the maximum size to which user-created MEMORY tables are permitted to grow,The value of the variable is used to calculate MEMORY table MAX_ROWS values.
这个参数主要针对用户创建的MEMORY表,限制内存表最大空间大小,注意不是记录数目,与单条记录的长度有关。如果超出阀值,则报错。ERROR 1114 (HY000): The table 'xxx' is full
tmp_table_size
The maximum size of internal in-memory temporary tables.
对于用户手工创建的内存表,只有参数max_heap_table_size起作用;对于内部产生的内存表,则参数max_heap_table_size和tmp_table_size同时起作用。对于内部产生的内存表(比如union,group by等产生的临时表),先是采用内存表(memory表),然后超过设置的阀值(max_heap_table_size,tmp_table_size)就会转为磁盘表,使用innodb引擎或者myisam引擎,通过参数internal_tmp_disk_storage_engine指定。
tmpdir
如果内存临时表超出了限制,MySQL就会自动地把它转化为基于磁盘的MyISAM表,存储在指定的tmpdir目录下
2.状态监控
Created_tmp_tables,内部临时表数目
Created_tmp_disk_tables,磁盘临时表数目
3.information_schema相关
mysql> create temporary table t1_tmp(id int primary key,c1 int); Query OK, 0 rows affected (0.02 sec) mysql> SELECT * FROM information_schema.INNODB_TEMP_TABLE_INFO; +----------+---------------+--------+-------+----------------------+---------------+ | TABLE_ID | NAME | N_COLS | SPACE | PER_TABLE_TABLESPACE | IS_COMPRESSED | +----------+---------------+--------+-------+----------------------+---------------+ | 10063 | #sql693d_29_0 | 5 | 45 | FALSE | FALSE | +----------+---------------+--------+-------+----------------------+---------------+
总结
本文详细介绍了MySQL中临时表的核心特征,按需创建并且自动销毁,对于纯内存的数据特别适合,但为了避免内存不可控,实际上不仅仅有内存临时表,还有磁盘临时表。临时表和内存表本没有直接关联,因为临时表既可以是memory引擎,又可以innodb引擎将两者联系到了一起,实际上不同类别的临时表也是用到了不同引擎的优势。临时表使用的典型场景是union和group by。为了消除临时表,我们需要对group by列添加索引,或者对于大结果集,使用SQL_BIG_RESULT等。最后本文介绍了临时表相关的参数和状态变量,以及information_schema中的临时表信息。
推荐教程:《MySQL教程》
The above is the detailed content of In-depth understanding of MySQL temporary tables. For more information, please follow other related articles on the PHP Chinese website!