Home  >  Article  >  Web Front-end  >  What is the relationship between utf8 and unicode encoding? What's the difference?_Basic Tutorial

What is the relationship between utf8 and unicode encoding? What's the difference?_Basic Tutorial

WBOY
WBOYOriginal
2016-05-16 12:09:422046browse

UTF8 == Unicode Transformation Format -- 8 bit
is the Unicode transmission format. That is, convert Unicode files into BYTE transport streams.

UTF8 stream conversion program:
Input: unsigned integer c - the code point of the character to be encoded (enter a unicode value)
Output: byte b1, b2,b3, b4 - the encoded sequence of bytes (output four BYTE values)
Algorithm:
if (cb1 = c>>0 & 0x7F | 0x00
b2 = null
b3 = null
b4 = null
else if (cb1 = c>>6 & 0x1F | 0xC0
b2 = c>>0 & 0x3F | 0x80
b3 = null
b4 = null
else if (cb1 = c>>12 & 0x0F | 0xE0
b2 = c>>6 & 0x3F | 0x80
b3 = c >>0 & 0x3F | 0x80
b4 = null
else if (cb1 = c>>18 & 0x07 | 0xF0
b2 = c>>12 & 0x3F | 0x80
b3 = c>>6 & 0x3F | 0x80
b4 = c>>0 & 0x3F | 0x80
end if
================== ====
unicode is a coding table, for example, specifying a code for a Chinese character. Similar to GB2312-1980, GB18030, etc., but with different character sets.
=====================
A unicode code may be converted into UTF8 with a length of one BYTE, or two, three, or four BYTE code, depends on the value of the unicode code. Because the value of English unicode code is less than 0x80, it only needs to be transmitted in UTF8 of one BYTE, which is faster than sending two BYTEs of unicode.
UTF8 is just a "re-encoding" method devised to transmit unicode.
To convert UTF8 to unicode, just use the program I gave above to calculate back.

UTF8 is a transitional solution from the existing ASCII system to the Unicode system. UTF8 ensures ASCII compatibility and then expands toward large character sets. This is the solution recommended by Unicode. However, because the angle of solving the problem is different, it is not a good solution to the existing Chinese system. The following link provides detailed preliminary knowledge of UTF8 encoding http://www.acnis.com/modules.php?name=ArticlE&file=article&sid=102 Reference: http://www.acnis.com/modules. php?name=ArticlE&file=article&sid=102

What is Unicode. The basic goal of Unicode is to unify all encodings, that is, it contains all character sets. In this way, as long as a system supports Unicode, it can handle these character sets. Generally Unicode has two bytes. All current Windows operating systems support Unicode.

What is UTF8? UTF8 is a Unicode encoding, that is, its encoded character set is consistent with Unicode. But the encoding method is different. For English characters, UTF8 encoding is the same as normal, using one byte. But for Chinese, it needs to be represented by three bytes (three in memory).

The disadvantage of UTF8 and Unicode is that when dealing with problems such as search and search, the algorithm seems to be more complex and inefficient (in memory).

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn