This article brings you what is the MySQL character set? A related introduction to the character set. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
Character set introduction
##gbk/gb2312
gbk/gb2312 adopts a double-byte character set. Both Chinese and English characters are represented by double characters. In order to distinguish Chinese characters, the highest bits are set to 1
gb2312 is a sub-child of gbk Set, gbk is a subset of gb18030, gb2312 can only store simplified Chinese characters
gbk is a large character set including Chinese, Japanese and Korean characters
Usually using the gbk character set is enough
The international versatility is worse than utf8, but utf8 The database occupied is larger than gbk (utf8 is a three-byte character set)
utf8/utf8mb4
UTF full name (Unicode Transformation Format), is a storage method of Unicode, variable length character encoding, also known as Unicode; Try to use utf8 for the database character set (including connect, result, and the final html page must be consistent with utf8);
UTF8 uses variable-length bytes to store Unicode characters. For example, ASCII letters continue to use 1 byte to store, accented characters, Greek letters, or Cyrillic letters use 2 bytes to store, while commonly used Chinese characters require Use 3 bytes; that is, one English character is equal to one byte, and one Chinese character (including traditional Chinese) is equal to three bytes.
utf8mb4 can store up to 4 bytes per character, so it can support more character sets; utf8mb4 is commonly used in projects to store emoji expressions;
latin1 is an 8bit (1 bytes) character set, but it cannot cover Asian and African languages. ;
unidoce is an extension of latin1, which adds support for regular languages in Asia and Africa, but still does not support all languages, and it is not efficient to use unidoce to represent ASCII (it is often easy to convert a small character set into a large character set) loss of characters);
utf8 is an extension of unicode; character sets such as
gbk, gb2312 and utf8 must be converted to each other through Unicode encoding.
Character set usage suggestions
1. When you are very sure that there are only Chinese terminal users, you can choose gbk / gb23122. In order to facilitate data migration and multiple terminal display, it is best to use utf83. When characters do not need to be case-sensitive, the default xx_ci check set can be used, otherwise Select the xx_bin verification set (in a production environment, try not to modify the verification set)4. The default character set is latin1. This character set stores Chinese characters separately, so that the retrieval results are not accurate enough. The advantage is To save space, use is not recommendedMySQL character set range
##Server layer (server) > Database (database) > ; Data table (table) > Field (column) > Connection (connection) | Result set (result)##MySQL character set priority
Connection | result set (result) > field (column) > data table (table) > database (database) > server layer (server)
char type
char(N): N represents the number of characters (also called character length), not bytes char(N): It is a fixed-length storage, occupying a fixed-length storage space, and the insufficient part is filled with spaces; when MySQL processes the char(N) type, it needs to strip the spaces and return.
Storage space: The storage space of char(N) type is related to the character set. Combined with the knowledge points of the character set just now, a Chinese occupies 3 bytes in the utf8 character set, gbk occupies 2 bytes, numbers and Characters are uniformly represented by one character.
char(30 ), the maximum number of letters and Chinese characters that can be stored in different character sets, and the space occupied
gbk: Can store 30 bytes and takes up space 30*2utf8: Can store 30 bytes and takes up space 30*3varchar(N): N represents the number of characters (also called character length), not bytesvarchar(N): It is variable-length storage, using only necessary storage space.
Storage space: The storage space of varchar(N) type is related to the character set. Combined with the knowledge points of the character set just now, a Chinese occupies 3 bytes in the utf8 character set, gbk occupies 2 bytes uniformly, and numbers and characters are represented by one character.
Storage mechanism: The varchar(N) field storage actually starts from the second byte, and then uses 1 to 2 bytes to represent the actual length. The rest is the range that can store data, so the maximum available storage is The range is 65535-3=65532 bytes; the first byte identifies whether it is empty. (If the length is less than 255 bytes, use one byte to represent the length; if it is greater than 255 bytes, use two bytes to represent the length)
Modify the database instance character set
Temporary effect
mysql> set character_set = 'gbk'; mysql> set character_set_client = 'gbk';
Globally effective
mysql> set global character_set_client = 'gbk'; Query OK, 0 rows affected (0.00 sec)
Permanent effect
vim /etc/my.cnf character-set-server=utf8
Summary: The above is the entire content of this article, I hope it will be helpful to everyone's study. For more related tutorials, please visit mysql database graphic tutorial , MySQL video tutorial, bootstrap video tutorial!
The above is the detailed content of What is the MySQL character set? Introduction to character sets. For more information, please follow other related articles on the PHP Chinese website!