Home > Article > Web Front-end > Interpreting HTML: Namespaces and Character Encodings
In the process of working on projects, we often establish various specifications to facilitate better cooperation between teams and better complete the project; similarly, we often hear various agreements, such as Google The IM software Gtalk uses an open In front of the user, they need to use the HTTP protocol.
For the same reason, because browsers have different kernels and render the default style differently, a set of rules that each browser follows is needed to ensure that the same web document is rendered on different browsers. The style is consistent, this rule is the DOCTYPE statement.
Because the Internet is interconnected, any two or more web page documents may involve data exchange, and because the XML language allows users to customize tags, any two exchanged documents may have the same tags, resulting in conflicts of the same tags, so a namespace is needed to distinguish the same tags that may exist in the exchange document.
XHTML, as a transition language from HTML to XML, cannot implement user-defined tags in XML language, so the namespaces in XHMTL documents are the same:
xmlns is XHTML The abbreviation of namespace is the so-called "namespace". Like the DOCTYPE declaration, xmlns is also a type of declaration. Unlike the DOCTYPE statement that still exists in HTML documents, xmlns does not exist in HTML documents. The xmlns we usually see appear in XHTML documents.
When making a web page, in addition to declaring DOCTYPE (document type) at the beginning, if it is an XHTML document, you also need to declare a namespace, and the third thing that needs to be declared is the character encoding type of the web page document:
In order to be correctly interpreted by browsers and validated by W3C, each XHTML document should declare the character encoding used. Many times, garbled characters in web documents are mostly caused by incorrect character encoding.
utf-8 is a variable-length encoding expression of Unicode. As a global character encoding, it is being used by more and more web documents. Web pages using utf-8 character encoding can maximize the Avoid garbled characters caused by different character encodings when users from different regions access the same web page.
But when we open most domestic websites, especially large portal websites, the statement about character encoding is not utf-8, but gb2312:
Of course, in addition to gb2312, there are also some websites that use gbk Or gb18030 encoding, these three character encodings all belong to the Simplified Chinese character set. That is to say, if a computer does not have the Simplified Chinese character set installed, when it accesses a Chinese webpage with the character encoding of gb2312, garbled characters will be displayed.
Since gb2312 character encoding may cause garbled characters due to user access from different regions, why not use utf-8?
One of the reasons may be due to historical reasons, and the other more important reason should be the different document sizes caused by the different storage methods of the two encodings.
When using the gb2312 character encoding set, a Chinese character occupies 2 bytes, but the number of bytes occupied by a Chinese character in UTF-8 encoding is often 3 bytes, or even more than 3 bytes. of bytes. Therefore, for the same Chinese document, the storage volume using gb2312 character encoding is smaller than the document size stored in utf-8 encoding.
For Chinese websites with a lot of text and high traffic, web documents encoded with gb2312 can save a lot of traffic in downloading and transmission. Furthermore, the user groups of Chinese websites are basically locked in Chinese users. , these are the reasons why many websites use gb2312 encoding instead of utf-8 encoding.
However, there are not many websites with a lot of text and high traffic in China. In addition, there may be problems with pairs of garbled characters, so it is recommended to use UTF-8 encoding when making web pages.
Of course, no matter what encoding is used, the most important thing is that the encoding used by the entire site must be unified.
In addition to the above method for declaration of character encoding, you may also see another declaration method:
This declaration method is for older versions of browsers, and browsers have been generally updated. This method of declaration is no longer recommended today.
The above is about interpreting HTML: namespace and character encoding. For more related articles, please pay attention to the PHP Chinese website (www.php.cn)!