Home  >  Article  >  Web Front-end  >  Detailed introduction to the correct use of GBK and UTF-8 encoding

Detailed introduction to the correct use of GBK and UTF-8 encoding

黄舟
黄舟Original
2017-07-26 13:28:092537browse

Web page encoding is translated into English as web page encoding, which is a library that specifies its specific character encoding format in web pages.

GBK is a standard based on the national standard GB2312 and expanded to be compatible with GB2312. The text encoding of GBK is represented by double bytes, that is, both Chinese and English characters are represented by double bytes. In order to distinguish Chinese characters, the highest bits are set to 1. GBK contains all Chinese characters and is a national encoding. It is less versatile than UTF8, but UTF8 occupies a larger database than GBK.

UTF-8: Unicode TransformationFormat-8bit, BOM is allowed, but BOM is usually not included. It is a multi-byte encoding used to solve international characters. It uses 8 bits (that is, one byte) for English and 24 bits (three bytes) for Chinese. UTF-8 contains characters that are used by all countries in the world. It is an international encoding and has strong versatility. UTF-8 encoded text can be displayed on browsers in various countries that support the UTF8 character set. If it is UTF8 encoding, Chinese can also be displayed on foreigners' English IE, and they do not need to download IE's Chinese language support package.

Although the UTF-8 version has good international compatibility, the Chinese version requires 50% more database storage space than the GBK/BIG5 version, so it is not recommended and is only for those who have special requirements for international compatibility. User use. Simply put: For websites with more Chinese characters, it is appropriate to use GBK encoding to save database space. For websites with more English, it is appropriate to use UTF-8 to save database space.

How to convert GBK, GB2312, etc. to UTF8? Unicode encoding must be used to convert GBK, GB2312, etc. to UTF8: GBK, GB2312—Unicode—UTF8; UTF8—Unicode—GBK, GB2312. Using "Save As" in Windows Notepad, you can convert between GBK, Unicode, Unicode big endian and UTF-8 encoding methods.

How to make the browser correctly identify the web page encoding? Generally, the following sentence must be included in the web page: , indicating that the character set encoding of this web page is GB2312. (or UTF-8)

Why does the page sometimes specify encoding? Why do garbled characters sometimes appear? This may be caused by the page declaration encoding being inconsistent with the encoding of the file itself. More often, the page is opened with the wrong encoding and then saved, or some FTP software is used to directly modify the file online, such as CuteFTP. The conversion error occurs due to incorrect software encoding configuration. coded. At this time, use Windows Notepad to open it and use "Save As" to save it as the corresponding encoding to solve the problem.

When using IE as a browser on a Windows operating system, this problem often occurs: when browsing a webpage encoded in UTF-8, the browser cannot automatically recognize the encoding used in the page, even if the webpage has been The encoding format has been declared: , which causes some pages containing Chinese UTF-8 encoding to produce blank output. . If you are using Firefox or Sarafi browsers, this will not cause this problem. This is because when IE parses the web page encoding, it prioritizes the tags in the HTML, and then the information in the HTTP header, while the Mozilla series of browsers do the opposite.

Because UTF-8 uses 3 bytes to represent a character, while ordinary GB2312 or BIG5 uses two. When the page is output, due to the above reasons, when the browser parses and outputs the content of http://tbwsy.sinaapp.com/, if there are an odd number of full-width characters before , when IE parses UTF-8 as two bytes, half a Chinese character will appear. At this time, the half Chinese character will be combined with the < of to form a garbled word, causing IE to be unable to read < title> part, making the entire page empty and outputting it. If you look at the source file at this time, you will find that the entire page has actually been output, but the browser does not display the content. The simplest solution is to put before .

The above is the detailed content of Detailed introduction to the correct use of GBK and UTF-8 encoding. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn