easy to understand! The difference between utf8 and utf8mb4-PHP Problem-php.cn

Home

Backend Development

PHP Problem

easy to understand! The difference between utf8 and utf8mb4

silencement

Jan 25, 2020 am 12:14 AM

Character Encoding

easy to understand! The difference between utf8 and utf8mb4

1. Introduction

MySQL added the utf8mb4 encoding after 5.5.3. mb4 means most bytes 4, which is specially designed to be compatible with four bytes. unicode. utf8mb4 is a superset of utf8

. No other conversion is required except changing the encoding to utf8mb4. Of course, in order to save space, it is usually enough to use utf8.

2. Content description

As mentioned above, since utf8 can store most Chinese characters, why should we use utf8mb4? It turns out that the maximum character length of utf8 encoding supported by mysql is 3 characters. section, such as

, an exception will be inserted if a 4-byte wide character is encountered. The maximum Unicode character that can be encoded by three-byte UTF-8 is 0xffff, which is the Basic Multilingual Plane (BMP) in Unicode. In other words, any Unicode characters that are not in the basic multi-text plane cannot be stored using Mysql's utf8 character set. Including Emoji expressions (Emoji

is a special Unicode encoding, common on ios and android phones), and many uncommon Chinese characters, as well as any new Unicode characters, etc. (utf8's lack of

point).

Usually, when the computer stores characters, it allocates storage space according to different types of characters and encoding methods. For example, the following encoding methods;

①In ASCII encoding, one English letter (regardless of upper and lower case) occupies one byte of space, and one Chinese character occupies two bytes of space. A binary number sequence, when stored as a digital unit in the computer, is generally an 8-bit binary number, converted to decimal. The minimum value is 0 and the maximum value is 255.

②In UTF-8 encoding, one English character occupies one byte of storage space, and one Chinese character (including traditional Chinese) occupies three bytes of storage space.

③In Unicode encoding, an English character occupies two bytes of storage space, and a Chinese character (including Traditional Chinese) occupies two bytes of storage space.

④In UTF-16 encoding, the storage of an English alphabetic character or a Chinese character requires 2 bytes of storage space (some Chinese characters in the Unicode extension area require 4 bytes to store).

⑤In UTF-32 encoding, the storage of any character in the world requires 4 bytes of storage space.

Since utf8 is compatible with most characters, why should we extend utf8mb4?

With the development of the Internet, many new types of characters have been produced, such as emoji symbols, which are the little yellow face expressions we usually send when chatting. The emergence of this kind of characters

is not among the basic multi-plane Unicode characters, which makes it impossible to use utf8 storage in MySQL. MySQL then expanded the utf8 characters and added the utf8mb4 encoding.

So, if you want to allow users to use special symbols when designing a database, it is best to use utf8mb4 encoding for storage, so that the database has better compatibility, but this design will

result in more cost Lots of storage space.

The above is the detailed content of easy to understand! The difference between utf8 and utf8mb4. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:www.liqingbo.cn. If there is any infringement, please contact admin@php.cn delete

如何解决tomcat日志中的乱码问题？Dec 28, 2023 pm 01:50 PM

解决tomcat日志乱码问题的方法有哪些？Tomcat是一个流行的开源JavaServlet容器，广泛用于支持JavaWeb应用程序的部署和运行。然而，有时候在使用Tomcat记录日志时会出现乱码问题，这给开发人员带来了不少困扰。本文将介绍几种解决Tomcat日志乱码问题的方法。调整Tomcat的字符编码设置Tomcat默认使用ISO-8859-1字符编

如何处理Java开发中的字符编码转换异常Jul 01, 2023 pm 05:10 PM

如何处理Java开发中的字符编码转换异常在Java开发中，字符编码转换是一个常见的问题。当我们在处理文件、网络传输、数据库等操作时，不同的系统或者平台可能会使用不同的字符编码方式，导致字符的解析和转换出现异常。本文将介绍一些常见的字符编码转换异常的原因和解决方案。一、字符编码的基本概念字符编码是用来将字符转换为二进制数据的规则和方法，常见的字符编码方式有AS

PHP中文字符编码处理技巧分享Mar 20, 2024 pm 05:12 PM

PHP中文字符编码处理技巧分享在Web开发中，特别是涉及到中文字符处理的时候，字符编码往往是一个常见的问题。正确处理中文字符编码可以避免出现乱码等问题，提高网站的用户体验。在PHP中，我们可以通过一些技巧来处理中文字符编码，下面将分享一些实用的处理方法和代码示例。1.设置PHP文件编码首先要确保PHP文件本身的编码是正确的，一般推荐使用UTF-8编码。在P

解决Go语言字符编码问题的方法Jun 30, 2023 am 09:21 AM

解决Go语言开发中的字符编码问题的方法在Go语言开发过程中，经常会遇到字符编码的问题。特别是在处理数据输入、输出、存储和传输时，正确处理字符编码是非常重要的。本文将介绍一些解决Go语言开发中字符编码问题的方法。首先，在处理字符编码问题之前，我们需要了解Go语言的字符编码标准。Go语言使用的是Unicode字符编码标准，这是一种全球通用的字符编码标准，支持几乎

有效的解决eclipse编辑器中乱码问题的方法Jan 04, 2024 pm 06:56 PM

解决eclipse乱码问题的有效方法，需要具体代码示例近年来，随着软件开发的飞速发展，eclipse作为最受欢迎的集成开发环境之一，为众多开发者提供了便利和高效。然而，使用eclipse时可能会遇到乱码问题，这对于项目开发和代码阅读带来了困扰。本文将介绍一些解决eclipse乱码问题的有效方法，并提供具体代码示例。修改eclipse文件编码设置：在eclip

在计算机中应用最普遍的字符编码是什么Apr 23, 2020 pm 04:52 PM

在计算机中应用最普遍的字符编码是ASCII码。ASCII是基于拉丁字母的一套电脑编码系统，是最通用的信息交换标准，并等同于国际标准ISO/IEC646。

如何正确处理中文编码：Go语言实践指南Mar 28, 2024 pm 12:48 PM

如何正确处理中文编码：Go语言实践指南随着互联网和计算机技术的飞速发展，中文编码也成为了不可忽视的问题。作为一种强大的编程语言，Go语言在处理中文编码方面具有一定的便利性和灵活性。正确处理中文编码对于开发者来说至关重要，今天我们就来探讨一下如何在Go语言中正确处理中文编码，同时给出一些具体的代码示例。使用UTF-8编码在处理中文编码时，Go语言推荐使用UTF

解决Linux终端乱码显示的有效技巧Mar 20, 2024 pm 09:54 PM

解决Linux终端乱码显示的有效技巧在日常使用Linux系统的过程中，我们可能会遇到终端显示乱码的情况，这不仅影响了我们的工作效率，还给我们带来了困扰。本文将介绍一些解决Linux终端乱码显示问题的有效技巧，并给出具体的代码示例，希望能帮助读者解决这一问题。修改终端字符编码：在终端中输入以下命令，可以修改终端的字符编码为UTF-8，这是目前广泛使用的字符编码

See all articles