Home  >  Article  >  Database  >  Summary of temporary tables for MySQL learning

Summary of temporary tables for MySQL learning

little bottle
little bottleforward
2019-04-29 12:03:382289browse

Compared with ordinary user data tables, temporary tables in MySQL/InnoDB should be much unfamiliar to everyone. In addition, the timing and location of creation of different temporary tables are not fixed, which further increases the mystery. The most elusive thing is that temporary tables often create files first, and then delete the files without doing anything, leaving a handle for reading and writing. It gives people the impression that the dragon has seen the beginning but not the end. This article analyzes in detail the processing methods of temporary tables in various versions of MySQL. I hope it will be helpful to everyone.

Overview

To be precise, there are two types of temporary tables that we often talk about. One is really a table, used to store the data sent by the user. The table reading and writing interface is used for writing. The table must exist on the file system when reading and writing. The other type should be a temporary file used to store data in the intermediate process of SQL calculation. The reading and writing is done through file reading and writing. interface, the file may have been deleted during reading and writing, leaving a file handle for operation.

Related tutorials: mysql video tutorial

Temporary table

Temporary tables can be divided into disk temporary tables and memory temporary tables , and temporary files will only exist on the disk, not in the memory. Specifically, the memory forms of temporary tables include Memory engine and Temptable engine. The main difference is the storage method of character types (varchar, blob, text type). The former uses fixed-length space to store regardless of the actual number of characters. The latter The operator will use variable-length space storage, which improves the storage efficiency in the memory. More data can be processed in the memory instead of being converted into a disk temporary table. The Memory engine has been available since the early 5.6, and Temptable is a new engine introduced in 8.0. On the other hand, there are three forms of disk temporary tables, one is a MyISAM table, one is an InnoDB temporary table, and the other is a Temptable file map table. The last method is provided by 8.0.

In 5.6 and previous versions, the disk temporary table is placed in the temporary directory of the database configuration, and the undolog of the disk temporary table is placed together with the undo of the ordinary table (note that the disk temporary table will be restarted after the database is restarted. It was deleted later. There is no need for redolog to ensure the integrity of the transaction through crash recovery, so there is no need to write redolog, but undolog is still needed because it needs to support rollback).

After MySQL 5.7, the data and undo of the disk temporary table are separated and placed in a separate table space ibtmp1. The reason why temporary tables are separated is mainly to reduce the overhead of maintaining metadata when creating and deleting tables.

After MySQL 8.0, the data of the disk temporary table is placed separately in the Session temporary table space pool (ibt file in the #innodb_temp directory), and the undo of the temporary table is placed in the global table space ibtmp1. Another big improvement is that the space occupied by the disk temporary table data in 8.0 can be released to the operating system after the connection is disconnected, while in version 5.7 it needs to be restarted before it can be released.

Currently, there are two situations where temporary tables will be used:

Users explicitly create temporary tables

This is done by users explicitly Execute the command create temporary table to create a table. The engine type is either explicitly specified or the default configuration value (default_tmp_storage_engine) is used. Memory usage follows the memory management method of the specified engine. For example, InnoDB tables will be cached in the Buffer Pool and then written back to the disk file through the dirty thread.

In 5.6, the disk temporary table is located under tmpdir, and the file name is similar to #sql4d2b_8_0.ibd, where #sql is a fixed prefix, 4d2b is the hexadecimal representation of the process number, 8 is the hexadecimal representation of the MySQL thread number (the id in show processlist), 0 is the starting point for each connection from 0 The starting incremental value, ibd is the disk temporary table of innodb (controlled by the parameter default_tmp_storage_engine). In 5.6, after the disk temporary table is created, the corresponding frm and engine files are created under tmpdir and can be viewed through the file system ls command. After the connection is closed, the corresponding files are automatically deleted. Therefore, if we see many similar format file names in 5.6's tmpdir, we can use the file names to determine which process and which connection use the temporary table. This technique is especially applicable when troubleshooting the problem that the tmpdir directory takes up too much space. . This kind of temporary table explicitly created by the user will be automatically released and the space will be released back to the operating system when the connection is released. The undolog of the temporary table is stored in the undo table space and placed together with the undo of the ordinary table. With the undo rollback segment, temporary tables created by users can also support rollback.

In 5.7, the temporary disk table is located in the ibtmp file, and the ibtmp file location and size control method are controlled by the parameter innodb_temp_data_file_path. The data and undo of explicitly created tables are in ibtmp. After the user connection is disconnected, the temporary table will be released, but just by marking it in the ibtmp file, the space will not be released back to the operating system. If you want to free up space, you need to restart the database. In addition, one thing to note is that in 5.6 you can directly see the created file under tmpdir, but in 5.7 it is created in the ibtmp table space, so you cannot see the specific table file. If you need to check it, you need to check the INFORMATION_SCHEMA.INNODB_TEMP_TABLE_INFO table. There is a column name in it. You can see the table name here. The naming specification is similar to that of 5.6, so connections that take up a lot of space can also be found quickly.

In 8.0, the data and undo of the temporary table are further separated. The data is stored in the ibt file (controlled by the parameter innodb_temp_tablespaces_dir), and the undo is still stored in the ibtmp file (still controlled by the parameter innodb_temp_tablespaces_dir). Parameter innodb_temp_data_file_path control). The one that stores ibt files is called Session temporary table space, and the ibtmp that stores undo is called Global temporary table space. Here is an introduction to the Session temporary table space that stores data. Session temporary table space appears on disk as a set of file pools composed of ibt files. When starting, the database will be re-created in the configured directory and deleted when the database is closed. When starting up, 10 ibt files will be created by default, and each connection will use up to two, one for the temporary table created by the user, and the other for the implicit temporary table created by the optimizer described below. Of course, the temporary table will only be created when it is needed. If it is not needed, the ibt file will not be occupied. When all 10 ibts are used, the database will continue to be created, with a maximum of 400,000 created. When the connection is released, the ibt file used by the connection will be automatically released and the space will be reclaimed. If you want to reclaim the Global temporary table space, you still need to restart. However, since the files storing data have been separated and support dynamic recycling (that is, space is released when the connection is disconnected), the space occupation problem that has troubled everyone for a long time on 5.7 has been greatly alleviated. Of course, there is still room for optimization. For example, the space needs to be released after the connection is disconnected. In theory, a lot of space can be released after certain SQL (such as the user dropping an explicitly created temporary table) is executed. freed. In addition, if you need to check the table name, still check the

INFORMATION_SCHEMA.INNODB_TEMP_TABLE_INFO

table. It should be noted that on 8.0, explicit temporary tables cannot be compressed tables, but 5.6 and 5.7 can.

The optimizer implicitly creates a temporary table

This kind of temporary table is an auxiliary table created by the database to assist the execution of some complex SQL. Is a temporary table required? , generally determined by the optimizer. Different from the direct creation of disk files for temporary tables explicitly created by users, if the optimizer feels that SQL needs temporary table assistance, it will first use the memory temporary table. If it exceeds the configured memory (min(tmp_table_size, max_heap_table_siz)), it will be converted into Disk temporary table, this kind of disk temporary table is similar to one explicitly created by the user. The engine type is controlled by the parameter

internal_tmp_disk_storage_engine

. Generally, slightly more complex queries, including but not limited to order by, group by, distinct, etc., will use this implicitly created temporary table. Users can use the explain command to see if there is a word like "Using temporary" in the Extra column. If so, a temporary table must be used. In 5.6, the implicit temporary table is still under tmpdir. During the execution of complex SQL, you can see this temporary table. Once the execution is completed, it will be deleted. It is worth noting that in 5.6, this implicitly created temporary table can only be used with the MyISAM engine, that is, there is no

internal_tmp_disk_storage_engine

parameter to control. Therefore, when there is only the innodb table in our system, we will also see some indicators of MyISAM changing. In this case, it is usually due to the implicit temporary table. In 5.7, the implicit temporary table is created in the ibtmp file. After the SQL is completed, it will be marked for deletion, but the space will still not be returned to the operating system. If it needs to be returned, the database needs to be restarted. In addition, 5.7 supports the parameter

internal_tmp_disk_storage_engine###, and users can choose InnoDB or MYISAM tables as disk temporary tables. ###

In 8.0, implicit temporary tables are created in the Session temporary table space, that is, placed together with the data of temporary tables explicitly created by the user. If a connection requires an implicit temporary table for the first time, the database will take one from the pool composed of ibt files for the connection to use until the connection is released. As mentioned above, we also mentioned that in 8.0, temporary tables explicitly created by users will also be allocated an ibt from the pool for use. Each connection uses up to two ibt files to store temporary tables. We can query INFORMATION_SCHEMA.INNODB_SESSION_TEMP_TABLESPACES to determine the whereabouts of the ibt file. In this table, each ibt file is one row, and there are several rows for several ibt files in the current system. There is a column called ID. If this column is 0, it means that this ibt is not used. If it is non-0, it means that the connection with this ID is in use. For example, if the ID is 8, it means that the connection with process_id 8 is using this ibt file. In addition, there is a purpose column. The value INTRINSIC indicates that the implicit temporary table is using this ibt, and USER indicates that the displayed temporary table is in use. In addition, there is a column size, indicating the current size. Users can query this table to determine the usage of temporary tables in the entire database, which is very convenient.

In 5.6 and 5.7, memory temporary tables can only use the Memory engine. In 8.0, there is an additional choice of Temptable engine. Temptable uses variable-length storage in its storage format, which can save storage space, further improve memory usage, and reduce the number of conversions to disk temporary tables. If the disk temporary table set is InnoDB or MYISAM, a conversion copy is required. In order to reduce consumption as much as possible, Temptable proposes an overflow mechanism, that is, if the memory temporary table exceeds the configured size, the disk space map method is used, that is, a file is opened and then deleted, leaving a handle for read and write operations. The format of reading and writing files is the same as the format in memory, so the conversion step is skipped and performance is further improved. Note that this feature is only available in version 8.0.16, which has not yet been released. Because the code cannot be seen yet, we can only guess its implementation through the documentation. In 8.0.16, the parameter internal_tmp_disk_storage_engine has been removed, and the disk temporary table can only use the InnoDB form or the overflow form of TempTable. From the documentation, we seem to see that the official recommendation is to use the new engine TempTable. Specific performance improvements will need to be tested after the code is released before we can draw conclusions.

Temporary files

Compared with temporary tables, temporary files may be more unfamiliar to everyone. Temporary files are more used in scenarios of caching data and sorting data. Under normal circumstances, cached or sorted data is first placed in memory. If the memory cannot be stored, temporary files on disk will be used. The use of temporary files is different from that of general tables. After a general table is created, it starts reading and writing data, and the file is deleted after use. However, the use of temporary files is different. After the table is created ( Use the mkstemp system function), immediately call unlink to delete the file, but do not close the file, and then use the original handle to operate the file. The advantage of this is that when the process crashes abnormally, no temporary files will remain because they have not been deleted, but the disadvantages are also obvious. We cannot see this file when we use the ls command on the file system. We need to use lsof L1 to view this file. A file with a deleted attribute.

Currently, we mainly use temporary files in the following scenarios:

Temporary files in DDL

In the process of doing online DDL, there are many operations The original table needs to be reconstructed. Before rebuilding the table, various secondary indexes need to be sorted. Sorting of a large amount of data is unlikely to be completed in memory and requires an external sorting algorithm. MySQL uses merge sort. During this process, temporary files need to be created. Generally, the space required is about the same as the original table. However, it will be cleaned up immediately after use, so when doing DDL, you need to reserve enough space. Users can specify the path to this sorting file by specifying innodb_tmpdir. This parameter can be modified dynamically, and is generally set on a path with sufficient disk space. The name of the temporary file is generally similar to ibXXXXXX, where ib is a fixed prefix, and XXXXXX is a random combination of uppercase and lowercase letters and numbers.

When doing online DDL, we allow users to perform DML operations on the original table, that is, add, delete, modify, and query. We cannot insert directly into the original table, so we need a place to record the modification operations on the original table, and then apply them to the new table after the DDL is completed. The place where this is recorded is the online log. Of course, if there are few changes, it can be stored directly in the memory (the parameter innodb_sort_buffer_size can be controlled, and this parameter also controls the size of each read and write block of the online log). This onlinelog is also stored in a temporary file, created in innodb_tmpdir. The maximum size is controlled by the parameter innodb_online_alter_log_max_size. If this size is exceeded, DDL will fail. The name of the temporary file is also similar to the name of the sorting temporary file mentioned above.

In the final stage of online DDL, all sorted files and DML generated during the process need to be applied to an intermediate file. The name of the intermediate file is similar to #sql-ib53-522550444.ibd, where #sql-ib is a fixed prefix, 53 is the table id of the InnoDB layer, 522550444 is a randomly generated number. At the same time, a frm file will also be generated in the server layer (not available in 8.0). The file name is similar to #sql-4d2b_2a.frm, where #sql is a fixed prefix, 4d2b is the hexadecimal representation of the process number, 2a is the hexadecimal representation of the thread number (the id in show processlist). Therefore, we can also use this naming rule to find which thread is doing DDL. One thing to note here is that the intermediate file mentioned here is actually a temporary table, not the temporary file mentioned above. These intermediate files can be viewed through ls. In the final step of the DDL, the two temporary files will be renamed back to their original table names. Because of this feature, when the database crashes midway, residual useless files may be left on the disk. In this case, you can first rename the frm file to the same name as the ibd file, and then use DROP TABLE#mysql50##sql-ib53-522550444` to clean up the remaining files. Note that if you delete the ibd file directly without using the drop command, there may still be residual information in the data dictionary, which is not very elegant. Of course, in 8.0, due to the use of atomic data dictionary, such residual files will not appear.

Cache operation in BinLog

BinLog will only be written to the file when the transaction is submitted. Before submission, it will be placed in the memory ( Controlled by parameter binlog_cache_size), if the memory slows down, a temporary file will be created. The method of use is to create it first through mkstemp, and then unlink directly, leaving a handle for reading and writing. The temporary file name is similar to MLXXXXXX, where ML is a fixed prefix and XXXXXX is a random combination of uppercase and lowercase letters and numbers. If the BinLog of a single transaction is too large, it may cause the size of the entire BinLog to be too large, thus affecting synchronization. Therefore, we need to control the transaction size as much as possible.

Optimize the temporary files created

Some operations, in addition to relying on implicit temporary tables at the engine layer to assist in the calculation of complex SQL, will also be created at the server layer Temporary files are used to assist, such as order by operation, which will call the filesort function. This function will also first use memory (sort_buffer_size) to sort. If it is not enough, it will create a temporary file to assist sorting. The file name is similar to MYXXXXXX, where MY is a fixed prefix and XXXXXX is a random combination of uppercase and lowercase letters and numbers.

Temporary files used in Load data

In BinLog replication, if the Load Data command is used on the main database, that is, data is imported from the file, the database will Write the entire file into the RelayLog, and then transfer it to the standby database. The standby database parses the RelayLog, extracts the corresponding Load file, and then applies it on the standby database. The location where this file is stored on the standby database is controlled by the parameter slave_load_tmpdir. The document recommends that this directory should not be configured in the memory directory of the physical machine or a directory that will be deleted after restarting. Because replication relies on this file, if it is accidentally deleted, replication will be interrupted.

Others

In addition to the several places mentioned above, there are several other places where temporary files are also used:

  • In the InnoDB layer, multiple temporary files will be created during startup to store: the last foreign key or unique key error; the last deadlock information; the last innodb status information. The reason for using temporary files instead of memory is that the memory usage will not fluctuate due to writing these indicators.
  • At the Server layer, temporary files will be used when show create table is used for partition tables. In addition, temporary files are also used when sorting internally in MYISAM tables.

Related parameters

*** tmpdir: *** This parameter is the configuration of the temporary directory. In 5.6 and previous versions, temporary tables/files will be placed here by default. This parameter can be configured with multiple directories, so that temporary tables/files can be created in different directories in turn. If different directories point to different disks, the purpose of offloading can be achieved.
*** innodb_tmpdir: *** This parameter is only used by sorting temporary files in DDL. It takes up a lot of space, so it is recommended to configure it separately. This parameter can be set dynamically and is also a Session variable.
*** slave_load_tmpdir: *** This parameter is mainly used when configuring the temporary file location of the backup library when Load Data is being replicated in BinLog. Because the database still needs to rely on Load data files after a crash, it is recommended not to configure a directory that will delete data after restarting.
*** internal_tmp_disk_storage_engine: *** When an implicit temporary table is converted to a disk temporary table, which engine is used, the default is only MyISAM and InnoDB. Only supported by versions 5.7 and later. This parameter was canceled after version 8.0.16.
*** internal_tmp_mem_storage_engine: *** The storage engine used when the implicit temporary table is in memory. You can choose Memory or Temptable engine. It is recommended to choose the new Temptable engine.
*** default_tmp_storage_engine: *** The default explicit temporary table engine, that is, the engine of temporary tables created by users through SQL statements.
*** tmp_table_size: *** min(tmp_table_size,max_heap_table_size) is the memory size of the implicit temporary table. If it exceeds this value, it will be converted into a disk temporary table.
*** max_heap_table_size: *** The memory limit size of the Memory memory table created by the user.
*** big_tables: *** Converting a memory temporary table to a disk temporary table requires a conversion operation, which needs to be converted in different engine formats. This is consumed. If we can know in advance that a disk temporary table is needed to execute a certain SQL, that is, the memory is definitely not enough, we can set this parameter so that the optimizer will skip using the memory temporary table and directly use the disk temporary table to reduce overhead.
*** temptable_max_ram: *** This parameter is only available after 8.0. It is mainly used to specify the memory size for the Temptable engine. After exceeding this, it will either be converted to a disk temporary table or use the built-in overflow mechanism.
*** temptable_use_mmap: *** Whether to use Temptable's overflow mechanism.

Summary and Suggestions

MySQL's temporary tables and temporary files are actually a relatively complex topic, involving many modules, and the timing of occurrence is more difficult to grasp, resulting in troubleshooting problems compared to ordinary tables. It's quite difficult. It is recommended that readers study the code carefully in order to locate difficult problems that may arise online.

The above is the detailed content of Summary of temporary tables for MySQL learning. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:cnblogs.com. If there is any infringement, please contact admin@php.cn delete