Binlog must be familiar to everyone. It is used in master-slave replication or data recovery in some cases. Since binlog is binary data, you generally need to use the mysqlbinlog tool to view it. This note analyzes the binlog format, hoping to understand what is done behind the mysqlbinlog tool.
Before explaining when to write binlog, let’s briefly introduce the purpose of binlog. Binlog is a binary log file, used to record MySQL data updates or potential updates (for example, the DELETE statement executes deletion but there is actually no data that meets the conditions). It is the binlog that is relied on in MySQL master-slave replication. To enable binlog in mysql, you need to set the log_bin parameter in my.cnf. In addition, you can also specify the database to record binlog through binlog_do_db
and binlog_ignore_db to specify the database not to record binlog. To enable binlog for running mysql, you can set it through the command SET SQL_LOG_BIN=1
. After the setup is completed, we can test the binlog.
It should be noted that the redo/undo log and mysql binlog in the innodb engine are completely different logs. They mainly have the following differences:
a) Different levels . The redo/undo log is maintained by the innodb layer, while the binlog is maintained by the mysql server layer. It has nothing to do with which engine is used. It records the log records of the update operations of all engines. For a more detailed description of innodb's redo/undo log, please refer to the relevant chapters in Jiang Chengyao's book "MySQL Technology Insider - Innodb Storage Engine".
b) The record content is different. The redo/undo log records the modification of each page, which is a combination of physical log + logical log (the redo log is physical to the page, the logical log is used in the page, and the undo log is logical log). The purpose is to ensure the data security consistency. Binlog records all transaction operation content, such as a statement DELETE FROM TABLE WHERE i > 1
, no matter what engine is used, of course the format is binary, you can use this to parse the log content Command mysqlbinlog -vv BINLOG
.
c) The recording timing is different. The redo/undo log will be continuously written during the transaction execution; and the binlog is only written to the log after the transaction is committed . The previous description is wrong. The binlog is written before the transaction is finally committed. Thanks anti-semicolon for pointing it out. Of course, when binlog is flushed to disk is related to parameter sync_binlog
.
Obviously, when we execute SELECT and other statements that do not involve data updates, the binlog will not be recorded, but those involving data updates will be recorded. It should be noted that for engines that support transactions such as innodb, the transaction must be submitted before the binlog will be recorded.
The timing of binlog flushing to disk is related to the sync_binlog parameter. If it is set to 0, it means that MySQL does not control the flushing of binlog, and the file system controls the flushing of its cache. If it is set to a value other than 0, It means that every sync_binlog transaction, MySQL calls the refresh operation of the file system to refresh the binlog to the disk. Setting it to 1 is the safest. In the event of a system failure, at most one transaction update will be lost, but it will have an impact on performance. Generally, it is set to 100 or 0, sacrificing a certain consistency to obtain better performance.
You can see the current number of binlogs through the command SHOW MASTER LOGS
. For example, the following is the binlog situation of mysql on my machine. The first column is the binlog file name, and the second column is the binlog file size. You can specify the binlog retention time by setting expire_logs_days
. To manually clean the binlog, you can specify the binlog name or specify the retention date. The commands are: purge master logs to BINLOGNAME;
and purge master logs before DATE;
.
...... | mysql-bin.000018 | 515 | | mysql-bin.000019 | 504 | | mysql-bin.000020 | 107 | +------------------+-----------+
The binlog format is divided into three types: statement, row and mixed. Mysql5.5 defaults to statement mode, of course We generally do not recommend using statement mode in master-slave synchronization, because some statements are not supported, For example, statements containing UUID functions, and LOAD DATA IN FILE statements
, etc. The mixed format is generally recommended. Regardless of the differences between these three formats for the time being, let’s take a look at the storage format of binlog. Binlog is a collection of binary files. Of course, in addition to the binlog files we see mysql-bin.xxxxxx, there is also a binlog index file mysql-bin.index. As written in the official documentation, the binlog format is as follows:
The binlog file starts with a magic number with a value of 0Xfe62696e, which corresponds to 0xfe 'b''i''n'.
Binlog consists of a series of binlog events. Each binlog event contains two parts: header and data.
#The header part provides the public type information of the event, including the creation time of the event, server, etc.
The data part provides specific information for the event, such as modification of specific data.
从mysql5.0版本开始,binlog采用的是v4版本,第一个event都是format_desc event
用于描述binlog文件的格式版本,这个格式就是event写入binlog文件的格式。关于之前版本的binlog格式,可以参见http://dev.mysql.com/doc/internals/en/binary-log-versions.html
接下来的event就是按照上面的格式版本写入的event。
最后一个rotate event
用于说明下一个binlog文件。
binlog索引文件是一个文本文件,其中内容为当前的binlog文件列表。比如下面就是一个mysql-bin.index文件的内容。
/var/log/mysql/mysql-bin.000019 /var/log/mysql/mysql-bin.000020 /var/log/mysql/mysql-bin.000021
接下来分析下几种常见的event,其他的event类型可以参见官方文档。event数据结构如下:
+=====================================+ | event | timestamp 0 : 4 | | header +----------------------------+ | | type_code 4 : 1 | | +----------------------------+ | | server_id 5 : 4 | | +----------------------------+ | | event_length 9 : 4 | | +----------------------------+ | | next_position 13 : 4 | | +----------------------------+ | | flags 17 : 2 | | +----------------------------+ | | extra_headers 19 : x-19 | +=====================================+ | event | fixed part x : y | | data +----------------------------+ | | variable part | +=====================================+
下面是我在FLUSH LOGS
之后新建的一个全新的binlog文件mysql-bin.000053,从binlog第一个event也就是format_desc event开始分析(mysql日志是小端字节序):
root@ubuntu:/var/log/mysql# hexdump -C mysql-bin.000053 00000000 fe 62 69 6e b8 b2 7f 56 0f 04 00 00 00 67 00 00 |.bin...V.....g..| 00000010 00 6b 00 00 00 01 00 04 00 35 2e 35 2e 34 36 2d |.k.......5.5.46-| 00000020 30 75 62 75 6e 74 75 30 2e 31 34 2e 30 34 2e 32 |0ubuntu0.14.04.2| 00000030 2d 6c 6f 67 00 00 00 00 00 00 00 00 00 00 00 00 |-log............| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 13 |................| 00000050 38 0d 00 08 00 12 00 04 04 04 04 12 00 00 54 00 |8.............T.| 00000060 04 1a 08 00 00 00 08 08 08 02 00 |...........|
对照官方文档中的说明来看下format_desc event
格式:
+=====================================+ | event | timestamp 0 : 4 | | header +----------------------------+ | | type_code 4 : 1 | = FORMAT_DESCRIPTION_EVENT = 15 | +----------------------------+ | | server_id 5 : 4 | | +----------------------------+ | | event_length 9 : 4 | >= 91 | +----------------------------+ | | next_position 13 : 4 | | +----------------------------+ | | flags 17 : 2 | +=====================================+ | event | binlog_version 19 : 2 | = 4 | data +----------------------------+ | | server_version 21 : 50 | | +----------------------------+ | | create_timestamp 71 : 4 | | +----------------------------+ | | header_length 75 : 1 | | +----------------------------+ | | post-header 76 : n | = array of n bytes, one byte per event | | lengths for all | type that the server knows about | | event types | +=====================================+
前面4个字节是固定的magic number,值为0x6e6962fe。接着是一个format_desc event
,先看下19个字节的header。这19个字节中前4个字节0x567fb2b8是时间戳,第5个字节0x0f是event type,接着4个字节0x00000004是server_id,再接着4个字节0x00000067是长度103,然后的4个字节0x0000006b是下一个event的起始位置107,接着的2个字节的0x0001是flag(1为LOG_EVENT_BINLOG_IN_USE_F,标识binlog还没有关闭,binlog关闭后,flag会被设置为0),这样4+1+4+4+4+2=19个字节的公共头就完了(extra_headers暂时没有用到)。然后是这个event的data部分,event的data分为Fixed data
和Variable data
两部分,其中Fixed data
是event的固定长度和格式的数据,Variable data
则是长度变化的数据,比如format_desc event的Fixed data长度是0x54=84个字节。下面看下这84=2+50+4+1+27个字节的分配:开始的2个字节0x0004为binlog的版本号4,接着的50个字节为mysql-server版本,如我的版本是5.5.46-0ubuntu0.14.04.2-log,与SELECT version();
查看的结果一致。接下来4个字节是binlog创建时间,这里是0;然后的1个字节0x13是指之后所有event的公共头长度,这里都是19;接着的27个字节中每个字节为mysql已知的event(共27个)的Fixed data的长度;可以发现format_desc event自身的Variable data部分为空。
接着我们不做额外操作,直接FLUSH LOGS
,可以看到一个rotate event
,此时的binlog内容如下:
...... 00000060 ................................. c2 b3 7f 56 04 |..............V.| 00000070 04 00 00 00 2b 00 00 00 96 00 00 00 00 00 04 00 |....+...........| 00000080 00 00 00 00 00 00 6d 79 73 71 6c 2d 62 69 6e 2e |......mysql-bin.| 00000090 30 30 30 30 35 34 |000054| 00000096
前面的内容跟之前的几乎一致,除了format_desc event的flag从0x0001变成了0x0000。然后从0x567fb3c2开始是一个rotate event
。依照前面的分析,前面19个字节为event的header,其event type是0x04,长度为0x2b=43,下一个event起始位置为0x96=150,然后是flag为0x0000,接着是event data部分,首先的8个字节为Fixed data
部分,记录的是下一个binlog的位置偏移4,而余下来的43-19-8=16个字节为Variable data
部分,记录的是下一个binlog的文件名mysql-bin.000054。对照mysqlbinlog -vv mysql-bin.000053
可以验证。
ssj@ubuntu:/var/log/mysql$ mysqlbinlog -vv mysql-bin.000053 ... # at 4 #151227 17:43:20 server id 4 end_log_pos 107 Start: binlog v 4, server v 5.5.46-0ubuntu0.14.04.2-log created 151227 17:43:20 BINLOG ' uLJ/Vg8EAAAAZwAAAGsAAAAAAAQANS41LjQ2LTB1YnVudHUwLjE0LjA0LjItbG9nAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA== '/*!*/; # at 107 #151227 17:47:46 server id 4 end_log_pos 150 Rotate to mysql-bin.000054 pos: 4 ...
刷新binlog,设置binlog_format=statement
,创建一个表CREATE TABLE
tt(
ivarchar(100) DEFAULT NULL) ENGINE=InnoDB
, 然后在测试表tt中插入一条数据insert into tt values('abc')
,会产生3个event,包括2个query event和1个xid event。其中2个query event分别是BEGIN以及INSERT 语句,而xid event则是事务提交语句(xid event是支持XA的存储引擎才有的,因为测试表tt是innodb引擎的,所以会有。如果是myisam引擎的表,也会有BEGIN和COMMIT,只不过COMMIT会是一个query event而不是xid event)。
mysql> show binlog events in 'mysql-bin.000060'; +------------------+-----+-------------+-----------+-------------+--------------------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +------------------+-----+-------------+-----------+-------------+--------------------------------------------------------+ | mysql-bin.000060 | 4 | Format_desc | 4 | 107 | Server ver: 5.5.46-0ubuntu0.14.04.2-log, Binlog ver: 4 | | mysql-bin.000060 | 107 | Query | 4 | 175 | BEGIN | | mysql-bin.000060 | 175 | Query | 4 | 266 | use `test`; insert into tt values('abc') | | mysql-bin.000060 | 266 | Xid | 4 | 293 | COMMIT /* xid=138 */ | +------------------+-----+-------------+-----------+-------------+--------------------------------------------------
binlog如下:
....... 0000006b 01 9d 82 56 02 04 00 00 00 44 00 00 00 af 00 00 |...V.....D......| 0000007b 00 08 00 26 00 00 00 00 00 00 00 04 00 00 1a 00 |...&............| 0000008b 00 00 00 00 00 01 00 00 00 00 00 00 00 00 06 03 |................| 0000009b 73 74 64 04 21 00 21 00 08 00 74 65 73 74 00 42 |std.!.!...test.B| 000000ab 45 47 49 4e |EGIN| 000000af 01 9d 82 56 02 04 00 00 00 5b 00 00 00 0a 01 00 |...V.....[......| 000000bf 00 00 00 26 00 00 00 00 00 00 00 04 00 00 1a 00 |...&............| 000000cf 00 00 00 00 00 01 00 00 00 00 00 00 00 00 06 03 |................| 000000df 73 74 64 04 21 00 21 00 08 00 74 65 73 74 00 69 |std.!.!...test.i| 000000ef 6e 73 65 72 74 20 69 6e 74 6f 20 74 74 20 76 61 |nsert into tt va| 000000ff 6c 75 65 73 28 27 61 62 63 27 29 |lues('abc')| 0000010a 01 9d 82 56 10 04 00 00 00 1b 00 00 00 25 01 00 |...V.........%..| 0000011a 00 00 00 8a 00 00 00 00 00 00 00 |...........|
抛开format_desc event,从0000006b开始分析第一个query event。头部跟之前的event一样,只是query event的type为0x02,长度为0x44=64,下一个event位置为0xaf=175。flag为8,接着是data部分,从format_desc event我们可以知道query event的Fixed data
部分为13个字节,因此也可以算出Variable data
部分为64-19-13=32字节。
Fixed data:首先的4个字节0x00000026为执行该语句的thread id,接下来的4个字节是执行的时间0(以秒为单位),接下来的1个字节0x04是语句执行时的默认数据库名字的长度,我这里数据库是test,所以长度为4.接着的2个字节0x0000是错误码(注:通常情况下错误码是0表示没有错误,但是在一些非事务性表如myisam表执行INSERT...SELECT
语句时可能插入部分数据后遇到duplicate-key错误会产生错误码1062,或者是事务性表在INSERT...SELECT出错不会插入部分数据,但是在执行过程中CTRL+C终止语句也可能记录错误码。slave db在复制时会执行后检查错误码是否一致,如果不一致,则复制过程会中止),接着2个字节0x001a为状态变量块的长度26。
Variable data:从0x001a之后的26个字节为状态变量块(这个暂时先不管),然后是默认数据库名test,以0x00结尾,然后是sql语句BEGIN,接下来就是第2个query event的内容了。
第二个query event与第一个格式一样,只是执行语句变成了insert into tt values('abc')
。
第三个xid event为COMMIT语句。前19个字节是通用头部,type是16。data部分中Fixed data为空,而variable data为8个字节,这8个字节0x000000008a是事务编号(注意事务编号不一定是小端字节序,因为是从内存中拷贝到磁盘的,所以这个字节序跟机器相关)。
这两个event是在binlog_format=row
的时候使用,设置binlog_format=row,然后创建一个测试表
CREATE TABLE `trow` ( `i` int(11) NOT NULL, `c` varchar(10) DEFAULT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1`
执行语句INSERT INTO trow VALUES(1, NULL), (2, 'a')
,这个语句会产生一个query event,一个table_map event、一个write_rows event以及一个xid event
。
mysql> show binlog events in 'mysql-bin.000074'; | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +------------------+-----+-------------+-----------+-------------+--------------------------------------------------------+ | mysql-bin.000074 | 4 | Format_desc | 4 | 107 | Server ver: 5.5.46-0ubuntu0.14.04.2-log, Binlog ver: 4 | | mysql-bin.000074 | 107 | Query | 4 | 175 | BEGIN | | mysql-bin.000074 | 175 | Table_map | 4 | 221 | table_id: 50 (test.trow) | | mysql-bin.000074 | 221 | Write_rows | 4 | 262 | table_id: 50 flags: STMT_END_F | | mysql-bin.000074 | 262 | Xid | 4 | 289 | COMMIT /* xid=245 */
对应的mysql-bin.000074数据如下:
... #query event (BEGIN) 0000006b 95 2a 85 56 02 04 00 00 00 44 00 00 00 af 00 00 |.*.V.....D......| 0000007b 00 08 00 26 00 00 00 00 00 00 00 04 00 00 1a 00 |...&............| 0000008b 00 00 00 00 00 01 00 00 00 00 00 00 00 00 06 03 |................| 0000009b 73 74 64 04 21 00 21 00 08 00 74 65 73 74 00 42 |std.!.!...test.B| 000000ab 45 47 49 4e |EGIN| #table_map event 000000af 95 2a 85 56 13 04 00 00 00 2e 00 00 00 dd 00 00 |.*.V............| 000000bf 00 00 00 32 00 00 00 00 00 01 00 04 74 65 73 74 |...2........test| 000000cf 00 04 74 72 6f 77 00 02 03 0f 02 0a 00 02 |..trow........| #write_rows event 000000dd 95 2a 85 56 17 04 00 00 00 29 00 00 00 06 01 00 |.*.V.....)......| 000000ed 00 00 00 32 00 00 00 00 00 01 00 02 ff fe 01 00 |...2............| 000000fd 00 00 fc 02 00 00 00 01 61 |........a| #xid event 00000106 95 2a 85 56 10 04 00 00 00 1b 00 00 00 21 01 00 |.*.V.........!..| 00000116 00 00 00 f5 00 00 00 00 00 00 00 |...........|
0x0000006b-0x000000ae为query event,语句是BEGIN,前面已经分析过。
0x0000000af开始为table_map event。除去头部19个字节,Fixed data为8个字节,前面6个字节0x32=50为table id,接着2个字节0x0001为flags。
Variable data部分,首先1个字节0x04为数据库名test的长度,然后5个字节是数据库名test+结束符。接着1个字节0x04为表名长度,接着5个字节为表名trow+结束符。接着1个字节0x02为列的数目。而后是2个列的类型定义,分别是0x03和0x0f(列的类型MYSQL_TYPE_LONG为0x03,MYSQL_TYPE_VARCHAR为0x0f)。接着是列的元数据定义,首先0x02表示元数据长度为2,因为MYSQL_TYPE_LONG没有元数据,而MYSQL_TYPE_VARCHAR元数据长度为2。接着的0x000a就是MYSQL_TYPE_VARCHAR的元数据,表示我们在定义表时的varchar字段c长度为10,最后一个字节0x02为掩码,表示第一个字段i不能为NULL。关于列的类型以及元数据等更详细的信息可以参见http://dev.mysql.com/doc/internals/en/table-map-event.html。
从0x000000dd开始为write_rows event,除去头部19个字节,前6个字节0x32也是table id,然后两个字节0x0001为flags。接着的1个字节0x02为表中列的数目。然后1个字节0xff各个bit标识各列是否存在值,这里表示都存在。
接着的就是插入的各行数据了。第1个字节0xfe的各个bit标识该行变化之后各列是否为NULL,为NULL记为1.这里表示第1列不为NULL,因为第一行数据插入的是(1,NULL)。接下来是各列的数据,第一列是MYSQL_TYPE_LONG,长度为4个字节,所以0x00000001就是这个值。第二列是NULL不占字节。接下来是第二行,先是0xfc标识两列都不为NULL,先读取第一列的4个字节0x00000002也就是我们插入的数字2,然后读取第二列,先是一个字节的长度0x01,然后是内容0x61也就是字符'a'。到此,write_rows event也就分析完了。rows相关的event还有update_rows event和delete_rows event等,欲了解更多可以参见官方文档。
最后是xid event,之前已经分析过,不再赘述。
intvar event在binlog_format=statement时使用到,用于自增键类型auto_increment,十分重要。intval event的Fixed data部分为空,而Variable data部分为9个字节,第1个字节用于标识自增事件类型 LAST_INSERT_ID_EVENT = 1 or INSERT_ID_EVENT = 2,余下的8个字节为自增ID。
创建一个测试表 create table tinc (i int auto_increment primary key, c varchar(10)) engine=innodb;
,然后执行一个插入语句INSERT INTO tinc(c) values('abc');
就可以看到intvar event了,这里的自增事件类型为INSERT_ID_EVENT。而如果用语句INSERT INTO tinc(i, c) VALUES(LAST_INSERT_ID()+1, 'abc')
,则可以看到自增事件类型为LAST_INSERT_ID_EVENT的intvar event。
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +------------------+-----+-------------+-----------+------------- | mysql-bin.000079 | 4 | Format_desc | 4 | 107 | Server ver: 5.5.46-0ubuntu0.14.04.2-log, Binlog ver: 4 | | mysql-bin.000079 | 107 | Query | 4 | 175 | BEGIN | | mysql-bin.000079 | 175 | Intvar | 4 | 203 | INSERT_ID=1 | | mysql-bin.000079 | 203 | Query | 4 | 299 | use `test`; insert into tinc(c) values('abc') | | mysql-bin.000079 | 299 | Xid | 4 | 326 | COMMIT /* xid=263 */
上面提到,binlog有三种格式,各有优缺点:
statement:基于SQL语句的模式,某些语句和函数如UUID, LOAD DATA INFILE等在复制过程可能导致数据不一致甚至出错。
row:基于行的模式,记录的是行的变化,很安全。但是binlog会比其他两种模式大很多,在一些大表中清除大量数据时在binlog中会生成很多条语句,可能导致从库延迟变大。
mixed:混合模式,根据语句来选用是statement还是row模式。
不同版本的mysql在主从复制要慎重,虽然mysql5.0之后都用的V4版本的binlog了,估计还是会有些坑在里面,特别是高版本为主库,低版本为从库时容易出问题。在主从复制时最好还是主库从库版本一致,至少是大版本一致。mysql复制是个大的话题,希望有时间能单独总结一篇笔记。
【相关推荐】
2. MySQL最新手册教程
The above is the detailed content of What is MySQL binlog? MySQL binlog usage and format analysis. For more information, please follow other related articles on the PHP Chinese website!