In mysql, the number of bytes occupied by a Chinese character is related to the encoding format: if it is GBK encoding, one Chinese character occupies 2 bytes; if it is UTF8 encoding, one Chinese character occupies 3 bytes, while English letters occupy 1 byte.
How many bytes does Chinese occupy in mysql?
#1. How many bytes a Chinese character occupies is related to the encoding:
##utf-8, English letters 1 byte
2. How many Chinese characters can varchar(n) store?
varchar(n) represents n characters. Regardless of Chinese characters or English, Mysql can store n characters. Only the actual byte length is different
3. How does MySQL check the length (number of bytes occupied)?
Available length function in SQL language:
select LENGTH(fieldname) from tablename
Description:
UTF-8: Unicode Transformation Format-8bit, BOM is allowed, but BOM is usually not included. It is a multi-byte encoding used to solve international characters. It uses 8 bits (that is, one byte) for English and 24 bits (three bytes) for Chinese. UTF-8 contains characters that are used by all countries in the world. It is an international encoding and has strong versatility. UTF-8 encoded text can be displayed on browsers in various countries that support the UTF8 character set. For example, if it is UTF8 encoding, Chinese can also be displayed on foreigners' English IE, and they do not need to download IE's Chinese language support package.
GBK is a standard based on the national standard GB2312 and expanded to be compatible with GB2312. The text encoding of GBK is represented by double bytes, that is, both Chinese and English characters are represented by double bytes. In order to distinguish Chinese characters, the highest bits are set to 1. GBK contains all Chinese characters and is a national encoding. It is less versatile than UTF8, but UTF8 occupies a larger database than GBD.
GBK, GB2312, etc. must be converted to UTF8 through Unicode encoding:
GBK, GB2312-->Unicode-->UTF8
UTF8- ->Unicode-->GBK, GB2312
- GB2312 is a subset of GBK, GBK is a subset of GB18030
- GBK is A large character set including Chinese, Japanese and Korean characters
- In order to avoid all garbled characters, UTF-8 should be used. It will also be very convenient to support internationalization in the future
- UTF8 can be regarded as a large character set, which contains the encoding of most text.
- One benefit of using UTF8 is that users in other regions (such as Hong Kong and Taiwan) can view your text normally without garbled characters without installing Simplified Chinese support.
Summary:
gb2312 is the code for Simplified Chinesegbk supports Simplified Chinese and Traditional Chinesebig5 supports Traditional Chineseutf8 supports almost all characters
Recommended tutorial:
mysql video tutorial
The above is the detailed content of How many bytes does Chinese occupy in mysql?. For more information, please follow other related articles on the PHP Chinese website!