Home  >  Article  >  Backend Development  >  easy to understand! The difference between utf8 and utf8mb4

easy to understand! The difference between utf8 and utf8mb4

silencement
silencementforward
2020-01-25 00:14:035340browse

easy to understand! The difference between utf8 and utf8mb4

1. Introduction

MySQL added the utf8mb4 encoding after 5.5.3. mb4 means most bytes 4, which is specially designed to be compatible with four bytes. unicode. utf8mb4 is a superset of utf8

. No other conversion is required except changing the encoding to utf8mb4. Of course, in order to save space, it is usually enough to use utf8.

2. Content description

As mentioned above, since utf8 can store most Chinese characters, why should we use utf8mb4? It turns out that the maximum character length of utf8 encoding supported by mysql is 3 characters. section, such as

, an exception will be inserted if a 4-byte wide character is encountered. The maximum Unicode character that can be encoded by three-byte UTF-8 is 0xffff, which is the Basic Multilingual Plane (BMP) in Unicode. In other words, any Unicode characters that are not in the basic multi-text plane cannot be stored using Mysql's utf8 character set. Including Emoji expressions (Emoji

is a special Unicode encoding, common on ios and android phones), and many uncommon Chinese characters, as well as any new Unicode characters, etc. (utf8's lack of

point).

Usually, when the computer stores characters, it allocates storage space according to different types of characters and encoding methods. For example, the following encoding methods;


①In ASCII encoding, one English letter (regardless of upper and lower case) occupies one byte of space, and one Chinese character occupies two bytes of space. A binary number sequence, when stored as a digital unit in the computer, is generally an 8-bit binary number, converted to decimal. The minimum value is 0 and the maximum value is 255.

②In UTF-8 encoding, one English character occupies one byte of storage space, and one Chinese character (including traditional Chinese) occupies three bytes of storage space.

③In Unicode encoding, an English character occupies two bytes of storage space, and a Chinese character (including Traditional Chinese) occupies two bytes of storage space.

④In UTF-16 encoding, the storage of an English alphabetic character or a Chinese character requires 2 bytes of storage space (some Chinese characters in the Unicode extension area require 4 bytes to store).

⑤In UTF-32 encoding, the storage of any character in the world requires 4 bytes of storage space.

Since utf8 is compatible with most characters, why should we extend utf8mb4?

With the development of the Internet, many new types of characters have been produced, such as emoji symbols, which are the little yellow face expressions we usually send when chatting. The emergence of this kind of characters

is not among the basic multi-plane Unicode characters, which makes it impossible to use utf8 storage in MySQL. MySQL then expanded the utf8 characters and added the utf8mb4 encoding.

So, if you want to allow users to use special symbols when designing a database, it is best to use utf8mb4 encoding for storage, so that the database has better compatibility, but this design will

result in more cost Lots of storage space.

The above is the detailed content of easy to understand! The difference between utf8 and utf8mb4. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:www.liqingbo.cn. If there is any infringement, please contact admin@php.cn delete