Home >Database >Mysql Tutorial >What is the reason why MySQL temporary tables can have duplicate names?
Today we will start with this question: What are the characteristics of temporary tables and which scenarios are they suitable for?
Here, I need to help you clarify an easily misunderstood issue first: Some people may think that temporary tables are memory tables. However, these two concepts are completely different.
Memory table refers to a table using the Memory engine. The syntax for creating a table is create table …engine=memory . **The data of this kind of table is stored in the memory and will be cleared when the system restarts, but the table structure will still exist. **Except for these two features that look "strange", from other features, it is a normal table.
Temporary table, various engine types can be used. If you use the InnoDB engine or the MyISAM engine's temporary table, the data is written to the disk when writing. Of course, temporary tables can also use the Memory engine.
After clarifying the difference between memory tables and temporary tables, let’s take a look at the characteristics of temporary tables.
For ease of understanding, let’s take a look at the following sequence of operations:
As you can see, the temporary table It has the following characteristics in use:
The syntax for creating a table is create temporary table ….
Other threads cannot access the temporary table created by a session, and it is only visible to that session. Therefore, the temporary table t created by session A in the figure is invisible to session B.
Temporary tables can have the same name as ordinary tables.
When there are temporary tables and ordinary tables with the same name in session A, the showcreate statement and the add, delete, modify and query statements access the temporary table.
The showtables command does not display temporary tables.
Since the temporary table can only be accessed by the session that created it, the temporary table will be automatically deleted when the session ends.
The join optimization scenario in the previous article is particularly suitable for using temporary tables because temporary tables have this feature. why? The reasons mainly include the following two aspects:
Temporary tables in different sessions can have the same name. If there are multiple sessions executing join optimization at the same time, there is no need to worry. Duplication of table names causes table creation failure.
No need to worry about data deletion. If a normal table is used, if the client is disconnected abnormally during process execution, or the database is restarted abnormally, the data tables generated during the intermediate process need to be cleaned up. Since the temporary table will be automatically recycled, this additional operation is not required.
Because there is no need to worry about duplicate name conflicts between threads, temporary tables are often used in the optimization process of complex queries. Among them, the cross-database query of the sub-database and sub-table system is a typical usage scenario.
The general scenario of database and table sharding is to disperse a logically large table to different database instances. for example. For a given field f, split the large table ht into 1024 sub-tables, and distribute these sub-tables to 32 database instances. As shown in the figure below:
# Generally, this kind of sub-database and sub-table system has a middle layer proxy. However, there are also some solutions that allow the client to connect directly to the database, that is, there is no proxy layer.
In this architecture, the selection of partition keys is based on the principle of "reducing cross-database and cross-table operations". If most statements will contain the equivalent condition of f, then f should be used as the partition key. The proxy that has parsed the SQL statement will decide which table to route it to for query.
For example, the following statement:
select v from ht where f=N;
At this time, we can use the sub-table rule (for example, N 24) to confirm which sub-table the required data is placed on. This kind of statement only needs to access one sub-table, and is the most popular statement form in the sub-database and sub-table scheme.
However, if there is another index k on this table, and the query statement is like this:
select v from ht where k >= M order by t_modified desc limit 100;
At this time, since the partition field f is not used in the query conditions, you can only go to Find all rows that meet the conditions in all partitions, and then perform the order by operation uniformly. In this case, there are two commonly used ideas.
The first idea is, which implements sorting in the process code of the proxy layer. The advantage of this method is that the processing speed is fast. After getting the data from the sub-database, it can directly participate in the calculation in the memory. However, the shortcomings of this solution are also obvious:
requires a relatively large amount of development work. The statement we gave as an example is relatively simple. If it involves complex operations, such as group by or even join, the development capabilities of the middle layer will be relatively high;
对proxy端的压力比较大,尤其是很容易出现内存不够用和CPU瓶颈的问题。
另一种思路就是,把各个分库拿到的数据,汇总到一个MySQL实例的一个表中,然后在这个汇总实例上做逻辑操作。
比如上面这条语句,执行流程可以类似这样:
在汇总库上创建一个临时表temp_ht,表里包含三个字段v、k、t_modified;
在各个分库上执行select v,k,t_modified from ht_x where k >= M order by t_modified desc limit 100;
把分库执行的结果插入到temp_ht表中;
执行select v from temp_ht order by t_modified desc limit 100;
得到结果。 这个过程对应的流程图如下所示:
在实践中,我们往往会发现每个分库的计算量都不饱和,所以会直接把临时表temp_ht放到32个分库中的某一个上。
你可能会问,不同线程可以创建同名的临时表,这是怎么做到的呢?
我们在执行
create temporary table temp_t(id int primary key)engine=innodb;
这个语句的时候,MySQL要给这个InnoDB表创建一个frm文件保存表结构定义,还要有地方保存表数据。
这个frm文件放在临时文件目录下,文件名的后缀是.frm,前缀是“#sql{进程id}_ {线程id}_ 序列号”。
从文件名的前缀规则,我们可以看到,其实创建一个叫作t1的InnoDB临时表,MySQL在存储上认为我们创建的表名跟普通表t1是不同的,因此同一个库下面已经有普通表t1的情况下,还是可以再创建一个临时表t1的。
先来举一个例子。
进程号为1234的进程,它的线程id分别为4和5,分别属于会话A和会话B。因此,可以看出,session A和session B创建的临时表在磁盘上的文件名不会冲突。
MySQL维护数据表,除了物理上要有文件外,内存里面也有一套机制区别不同的表,每个表都对应一个table_def_key。
一个普通表的table_def_key的值是由“库名+表名”得到的,所以如果你要在同一个库下创建两个同名的普通表,创建第二个表的过程中就会发现table_def_key已经存在了。
而对于临时表,table_def_key在“库名+表名”基础上,又加入了“server_id+thread_id”。
也就是说,session A和session B创建的两个临时表t1,它们的table_def_key不同,磁盘文件名也不同,因此可以并存。
在实现上,每个线程都维护了自己的临时表链表。这样每次session内操作表的时候,先遍历链表,检查是否有这个名字的临时表,如果有就优先操作临时表,如果没有再操作普通表;在session结束的时候,对链表里的每个临时表,执行 “DROPTEMPORARY TABLE +表名”操作。
你会注意到,在binlog中也有DROP TEMPORARY TABLE命令的记录。你一定会觉得奇怪,临时表只在线程内自己可以访问,为什么需要写到binlog里面?这,就需要说到主备复制了。
既然写binlog,就意味着备库需要。 你可以设想一下,在主库上执行下面这个语句序列:
create table t_normal(id int primary key, c int)engine=innodb;/*Q1*/ create temporary table temp_t like t_normal;/*Q2*/ insert into temp_t values(1,1);/*Q3*/ insert into t_normal select * from temp_t;/*Q4*/
如果关于临时表的操作都不记录,那么在备库就只有create table t_normal表和insert intot_normal select * fromtemp_t这两个语句的binlog日志,备库在执行到insert into t_normal的时候,就会报错“表temp_t不存在”。
你可能会说,如果把binlog设置为row格式就好了吧?因为binlog是row格式时,在记录insert intot_normal的binlog时,记录的是这个操作的数据,即:write_rowevent里面记录的逻辑是“插入一行数据(1,1)”。
确实是这样。如果当前的binlog_format=row,那么跟临时表有关的语句,就不会记录到binlog里。也就是说,只在binlog_format=statment/mixed的时候,binlog中才会记录临时表的操作。
在这种情况下,执行创建临时表语句的操作会被传递到备用数据库进行处理,从而触发备用数据库的同步线程创建相应的临时表。主库在线程退出的时候,会自动删除临时表,但是备库同步线程是持续在运行的。因此,我们需要在主数据库中再运行一个DROP TEMPORARY TABLE命令以便备用数据库执行。
Now, let me give you an example. In the following sequence, instance S is the standby database of M.
Two sessions on the main database M have created a temporary table t1 with the same name. These two create temporary table t1 statements will be transmitted to the standby database S.
However, the application log thread of the standby database is shared, which means that the create statement must be executed twice in the application thread. Despite multi-threaded replication, it is still possible to be assigned to the same worker in the slave library for execution. So, will this cause the synchronization thread to report an error?
Obviously not, otherwise the temporary table would be a bug. In other words, the backup thread needs to treat the two t1 tables as independent temporary tables for processing during execution. How is this achieved? When MySQL records the binlog, it will write the thread ID of the main library to execute this statement into the binlog. In this way, the application thread in the standby database can know the main database thread ID that executes each statement, and use this thread ID to construct the table_def_key of the temporary table:
temporary table of session A t1, the table_def_key in the standby database is: library name t1 "M's serverid" "session A's thread_id";
The temporary table t1 of session B, the table_def_key in the standby database is: Library name t1 "M's serverid" "session B's thread_id".
Because table_def_key is different, these two tables will not conflict in the application thread of the standby database.
The above is the detailed content of What is the reason why MySQL temporary tables can have duplicate names?. For more information, please follow other related articles on the PHP Chinese website!