Home >Database >Mysql Tutorial >What type does mysql use for Chinese characters?

What type does mysql use for Chinese characters?

青灯夜游
青灯夜游Original
2023-02-09 13:59:104688browse

In mysql, Chinese characters can use CHAR and VARCHAR types. The length declared for the CHAR and VARCHAR types represents the maximum number of characters the user wants to save. "CHAR(M)" is a fixed-length string, and the string column length is specified when defining; M represents the length of the column, ranging from "0 to 255" characters. "VARCHAR(M)" is a variable-length string, M represents the length of the maximum column, and the range is "0~65535".

What type does mysql use for Chinese characters?

The operating environment of this tutorial: windows7 system, mysql8 version, Dell G3 computer.

mysql defines Chinese character storage type

The mysql manual says:

In MySQL 5. In version x, the declared length of CHAR and VARCHAR types represents the maximum number of characters you want to save. For example, CHAR(30) can occupy 30 characters. In the case of GBK internal code, one Chinese character occupies two bytes, but in the case of UTF-8 internal code, one Chinese character takes up three bytes.

What are characters?

Baidu Encyclopedia says:

Characters refer to letters, numbers, words and symbols used in computers, including: 1, 2, 3, A, B, C, ~ ! ·#¥%……—*()—and so on. The storage of 1 Chinese character requires 2 bytes, the storage of 1 English character requires 1 byte, and 2 numbers are one byte. For example, when finding the length of a string in VB, len(str(1234))=4, len(1234)=2.

Characters are abstract entities that can be represented using many different character schemes or code pages. For example, Unicode UTF-16 encoding represents characters as a sequence of 16-bit integers, while Unicode UTF-8 encoding represents the same characters as a sequence of 8-bit bytes. The common language runtime uses Unicode UTF-16 (Unicode Transformation Format, a 16-bit encoding) to represent characters.​

Applications targeting the common language runtime use encodings to map character table forms from the native character scheme to other schemes. Applications use decoding to map characters from non-native schemes to native schemes.​

Computers and communication equipment use character encoding to express characters. It means that a character is assigned to something. Traditionally, it represents an integer number of bit sequences, so that it can be transmitted through the network and is also easy to store. Two commonly used examples are ASCII and UTF-8 for Unicode. According to Google statistics, UTF-8 is currently the most commonly used encoding method for web pages. [1] Compared with most character encodings that map characters to numbers or bit strings, Morse code uses a sequence of electronic pulses of variable length to represent characters.

What are bytes?

Byte, the English name is Byte. Byte is the abbreviation of Binary Term. One byte represents eight bits. It is commonly used as a unit of measurement for computer information, regardless of the type of data being stored. It is also an indispensable basic data type in programming languages-integer.

Byte (byte) can be abbreviated to B, for example, MB means Megabyte; Bit (bit) can be abbreviated to b, for example, Mb means Megabit.

So if we want to define a field that can store up to 10 Chinese characters, how should we define it?

With the above explanation, it should be clear about char(10) or varchar(10). Let’s verify it:

CREATE TABLE `t1` (
  `str` varchar(10) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Insert the following data into the table:

insert  into `t1`(`str`) values ('一二三四五六七八九十');
insert  into `t1`(`str`) values ('一二三四五六七八九十十一');
insert  into `t1`(`str`) values ('abcdefghijklmnopqrst');
insert  into `t1`(`str`) values ('1234567890123456');

Let’s check the results:

This can confirm what is said above, and if it exceeds the defined range, mysql will automatically truncate it. We actually This should be noted during application.

CHAR and VARCHAR types

CHAR(M) is a fixed-length string, and the string column length is specified when defining. When saved, pads spaces on the right to the specified length. M represents the length of the column, ranging from 0 to 255 characters.

For example, CHAR(4) defines a fixed-length string column containing a maximum of 4 characters. When a CHAR value is retrieved, trailing spaces are removed.

VARCHAR(M) is a variable-length string, M represents the length of the maximum column, and the range of M is 0~65535. The maximum actual length of a VARCHAR is determined by the size of the longest line and the character set used, while the actual space occupied is the actual length of the string plus one.

For example, VARCHAR(50) defines a string with a maximum length of 50. If the inserted string has only 10 characters, the actual stored string will be 10 characters and an end-of-string character. VARCHAR trailing spaces are preserved when values ​​are saved and retrieved.

[Example] The following saves different strings into CHAR(4) and VARCHAR(4) columns to illustrate the difference between CHAR and VARCHAR, as shown in the following table.

4 Bytes 3 Bytes 4 bytes4 bytes4 bytes 5 bytes4 bytes5 bytes
Insert value CHAR(4) Storage requirements VARCHAR(4) Storage requirements
' ' ' ' 4 bytes '' 1 word Section
'ab' ##'ab ' 'ab'
'abc' 'abc ' 'abc'
'abcd' 'abcd' 'abcd'
'abcdef' 'abcd' 'abcd'
[Related recommendations:

mysql video tutorial]

The above is the detailed content of What type does mysql use for Chinese characters?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn