[Introduction] 1 Introduction Choosing the character set when backing up MySQL is a difficult problem, especially for businesses with variable character sets. Mysqldump uses utf8 by default, and utf8 is also officially recommended. But in fact, for Chinese, a considerable number of gbk encoded characters do not have corresponding unicode encoding, which means that this part of the character set
It is a difficult problem to select the character set when backing up MySQL. Especially businesses with variable character sets. Mysqldump uses utf8 by default, and utf8 is also officially recommended. But in fact, for Chinese, a considerable part of gbk encoded characters do not have corresponding unicode encoding, which means that using utf8 backup for this part of the character set will cause data loss. So is there a solution?
Of course, the most direct way is to add the mapping of this part of the encoding. However, the number of this part of the character set is not a small number, and what is even more annoying is that there seems to be no authoritative mapping standard for this part of the character set. So, is there any other way?
In fact, if you use binary for backup, there will be no character set conversion process, and the above problems will not exist. So, does using binary solve all the problems of gbk? The answer is NO.
Before talking about binary problem. There are 2 questions that need to be clarified. For MySQL backup, it is divided into two parts: schema information and actual data. Schema information is always encoded in utf8, except for the default value. This is where the problem comes from.
2.1 utf8 backup
(1) File .frm will store the schema information of the table and store the default value of each field through an actual record. The information corresponding to Schema (including comment) is stored using utf8, but the default value is stored using the character set specified by the table.
(2) When executing the show create table statement, mysqld will convert the default value in frm from the encoding specified by the table to utf8 encoding.
(3) When mysqld executes the create table statement, the default value will be converted from utf8 to the character set specified by the table.
2.2 Binary backup
If binary is specified for backup. When importing, before creating the table, although character_set_client is specified as utf8, collation_connection is still binary. Therefore, the conversion from utf8 to the character set specified by the table will not be performed when storing the default value. If the table is specified as gbk encoding, the import will inevitably fail.
Example:
CREATE TABLE `t1`( `iNetbarId` int(11) NOT NULL DEFAULT '0 ', `iUin` bigint(20) NOT NULL DEFAULT '0', `vNetbarName` varchar(80) NOT NULL DEFAULT '"-"', PRIMARY KEY (`iNetbarId`) ) ENGINE=InnoDB DEFAULT CHARSET=gbk;
insert into t1 values(1,1,'xxxx'); |
You can see that the table that was exported normally was imported with an error of 1067 Invalid default value.
When mysqldump, before executing the create table statement, increase the setting of character_set_connection.
/*!40101 SET character_set_connection = utf8 */
##This is also considered a MySQL bug, since the schema information Use utf8 from beginning to end. Before executing create table, you should set the connection character set variable to utf8 instead of just setting the client's character set variable.The above is the detailed content of A brief discussion on the issue of MySQL backup character set. For more information, please follow other related articles on the PHP Chinese website!