search
HomeWeb Front-endH5 TutorialHTML5 standards learning-detailed coding explanation

I believe that every front-end engineer has encountered the "garbled code" brother at one time or another. No matter how solid your foundation is, you will inevitably have a few cups of tea with the "garbled code" brother occasionally during the production process. . As a front-end engineer, how do you specify the encoding of a page? Do you know how the browser recognizes the encoding?

First of all, a very simple example, use Yujian's HTML page to see the differences in various browsers:

<!DOCTYPE html>

The simplest HTML, # Both ## and

have no content, and the server does not give a specific encoding statement. Open it directly from the local computer and check the encoding of the page in each browser: BrowserDisplay encodingRemarks##IE6IE8IE9Character set##Firefox3.5GBK2312System Default character setFirefox4.0ISO-8859-1Western European languages, English default encodingChromeSystem default character setOperaChinese-Auto The detection should also be GB2312

It can be seen from the Table that each browser has different parsing for pages that do not use any means to declare encoding. Of course, in the simplest page, no matter what encoding is used (of course, the premise is a superset of ASCII), it has no impact, but it is enough to show the importance of setting the encoding correctly.

Encoding Statement

HTML4 and HTML5 each adopt a chapter to explain the encoding statement method. You can click here to view the relevant chapters of HTML4 or click here to view the relevant chapters of HTML5. chapter.

First of all, what is coding? Encoding is to specify the browser (or user agent) to use a special algorithm to parse the byte stream in a certain way to obtain the truly correct content. In the HTML standard, encodings can be represented using aliases. Encoding aliases come from the IANA definition, and only encodings that appear in this list can be recognized by browsers. Therefore, if UTF-8 is written as UTF8, the browser may completely ignore it. In addition, encoding aliases are case-insensitive.

In HTML4, there are three methods to specify the encoding of the page. According to the priority, they are:

  1. The Content-Type field in the HTTP header is followed by characters set.

  2. Use the <meta http-equiv="Content-Type"> tag to declare.

  3. For some external resources, such as js files loaded by the <script></script> tag, they can be declared through the charset attribute on the tag.

Of course there is no doubt about this. It should be noted that if the page is declared through the <meta http-equiv="Content-Type"> tag, When the browser encounters this tag, if it finds that the encoding it uses does not match the tag declaration, it will go back to the beginning and re-parse the page. This will cause part of the page to be re-parsed, so if you are trying to use a tag to declare the encoding, it is recommended to write the tag as early as possible. A best practice is to write it after the tag and before any other tags. Regarding this point, Google PageSpeed ​​also has a corresponding introduction.

Evolution of the Times

But as time went by, developers gradually discovered one thing. Just like the simplest statement of DOCTYPE, in fact, when the browser reads the encoding of the <meta> tag, it does not strictly follow the standard. All in all, since in the HTML parsing stage, the encoding of the page must be determined before the Tokenizer stage, it is impossible for the browser to decompose it when the DOM tree is built like analyzing the DOM tree<meta>The structure of the tag, take out the http-equiv and content attributes, and then determine the encoding.

In reality, the browser does a very simple thing to read the encoding defined by the <meta> tag:

  1. ## Make sure this is a

    <meta> tag. According to the status machine of HTML parsing, the "string is Can be sure.

  2. Look for the string (there is no concept of label here, just a string) and find a substring "charset".

  3. Read backward, ignore all space characters, and find the first meaningful character c.

  • If c is not the character "=", return to step 2 and continue searching.

  • If c is the character "=", continue going down.

  • Then skip all space characters, single quotes, double quotes, etc., and scan backwards until you encounter single quotes, double quotes, space characters, end tags, etc. The characters that should appear are above, and the string s scanned therein is intercepted.

  • Analyze s and get the encoding alias.

  • From the above algorithm, it is not difficult to find that the following writing methods can actually allow the browser to correctly identify the encoding:

    • < ;meta charset="utf-8" />

    • ##<meta charset="utf-8">

    • ...and many other weird ways of writing.
    • So, as history progressed, finally one day, various browser manufacturers sat together and began to discuss this issue... In the end, they were surprised to find that their implementations were very similar. (Maybe they just learned from each other), so they decided to turn this method into a standard... Finally, after a long discussion, the widely loved coding declaration method in HTML5 was born. In HTML5, it is called a "meta charset element", and its simplest form is as follows:
    <meta charset=utf-8>

    当然这是HTML的语法,如果遵从XHTML并觉得XHTML更加亲切地话,写成<meta charset="utf-8">也是没问题的。

    而前文所述的具体获取编码的算法也被详细地记录在案,可以在这里看到。

    到了HTML5时代,标准再次对编码的声明方式做了修正和细化,总得来说有以下的区别:

    • HTML5允许使用BOM来决定编码,但仅支持UTF-16的BOM(即U+FEFF),且没有说明BOM指定编码的优先级如何。

    • HTML5添加了meta charset标签。

    • HTML5规定如果一个页面没有指定编码,则使用ASCII作为其编码,而HTML4则规定浏览器可以根据所处的环境自行选择。

    其他杂项

    除了编码的基本声明方式外,标准中还有不少需要注意的细节:

    • 如果使用<meta>标签声明编码的话,该编码只能是ASCII的超集编码。可以简单地认为ASCII超集就是支持ASCII的256个字符的编码。

    • HTML5非常推荐使用UTF-8编码。

    • 标准中提出不要使用UTF-32、JIS_C6226-1983、JIS_X0212-1990、HZ-GB-2312、JOHAB等字符集,并禁止使用CESU-8、UTF-7、BOCU-1和SCSU字符集。但事实上浏览器却至少能识别UTF-7。

    • 对于想要严格遵守XHTML的开发者,应当使用XML声明来指定编码,即<?xml version="1.0" encoding="UTF-8" standalone="no" ?>。但是这个在IE6下会影响到DOCTYPE,所以开发者也不得在这一点上给予妥协,乖乖地去用HTML的声明方式。

    • 关于现实中各编码声明方式的优先级,以及一些其他需要注意的细节,这篇文章值得一读。

    最佳实践

    • 尽可能使用HTTP头指定编码。

    • 尽可能使用UTF-8,或者至少全站所有资源使用统一编码。

    • 如果想使用UTF-16,就给文件加上BOM,以确定是Little Endian还是Big Endian的。

    • 如果使用<meta>标签指定编码,可以不使用http-equiv的形式,但尽可能让标签出现在前面,至少保证在任何非ASCII字符之前。

    • 链接外部的脚本,如果无法确定编码相同的话,加上charset属性。

    UTF- 8
    UTF-8
    GB2312 System default
    GBK

    The above is the detailed content of HTML5 standards learning-detailed coding explanation. For more information, please follow other related articles on the PHP Chinese website!

    Statement
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
    H5: How It Enhances User Experience on the WebH5: How It Enhances User Experience on the WebApr 19, 2025 am 12:08 AM

    H5 improves web user experience with multimedia support, offline storage and performance optimization. 1) Multimedia support: H5 and elements simplify development and improve user experience. 2) Offline storage: WebStorage and IndexedDB allow offline use to improve the experience. 3) Performance optimization: WebWorkers and elements optimize performance to reduce bandwidth consumption.

    Deconstructing H5 Code: Tags, Elements, and AttributesDeconstructing H5 Code: Tags, Elements, and AttributesApr 18, 2025 am 12:06 AM

    HTML5 code consists of tags, elements and attributes: 1. The tag defines the content type and is surrounded by angle brackets, such as. 2. Elements are composed of start tags, contents and end tags, such as contents. 3. Attributes define key-value pairs in the start tag, enhance functions, such as. These are the basic units for building web structure.

    Understanding H5 Code: The Fundamentals of HTML5Understanding H5 Code: The Fundamentals of HTML5Apr 17, 2025 am 12:08 AM

    HTML5 is a key technology for building modern web pages, providing many new elements and features. 1. HTML5 introduces semantic elements such as, , etc., which enhances web page structure and SEO. 2. Support multimedia elements and embed media without plug-ins. 3. Forms enhance new input types and verification properties, simplifying the verification process. 4. Offer offline and local storage functions to improve web page performance and user experience.

    H5 Code: Best Practices for Web DevelopersH5 Code: Best Practices for Web DevelopersApr 16, 2025 am 12:14 AM

    Best practices for H5 code include: 1. Use correct DOCTYPE declarations and character encoding; 2. Use semantic tags; 3. Reduce HTTP requests; 4. Use asynchronous loading; 5. Optimize images. These practices can improve the efficiency, maintainability and user experience of web pages.

    H5: The Evolution of Web Standards and TechnologiesH5: The Evolution of Web Standards and TechnologiesApr 15, 2025 am 12:12 AM

    Web standards and technologies have evolved from HTML4, CSS2 and simple JavaScript to date and have undergone significant developments. 1) HTML5 introduces APIs such as Canvas and WebStorage, which enhances the complexity and interactivity of web applications. 2) CSS3 adds animation and transition functions to make the page more effective. 3) JavaScript improves development efficiency and code readability through modern syntax of Node.js and ES6, such as arrow functions and classes. These changes have promoted the development of performance optimization and best practices of web applications.

    Is H5 a Shorthand for HTML5? Exploring the DetailsIs H5 a Shorthand for HTML5? Exploring the DetailsApr 14, 2025 am 12:05 AM

    H5 is not just the abbreviation of HTML5, it represents a wider modern web development technology ecosystem: 1. H5 includes HTML5, CSS3, JavaScript and related APIs and technologies; 2. It provides a richer, interactive and smooth user experience, and can run seamlessly on multiple devices; 3. Using the H5 technology stack, you can create responsive web pages and complex interactive functions.

    H5 and HTML5: Commonly Used Terms in Web DevelopmentH5 and HTML5: Commonly Used Terms in Web DevelopmentApr 13, 2025 am 12:01 AM

    H5 and HTML5 refer to the same thing, namely HTML5. HTML5 is the fifth version of HTML, bringing new features such as semantic tags, multimedia support, canvas and graphics, offline storage and local storage, improving the expressiveness and interactivity of web pages.

    What Does H5 Refer To? Exploring the ContextWhat Does H5 Refer To? Exploring the ContextApr 12, 2025 am 12:03 AM

    H5referstoHTML5,apivotaltechnologyinwebdevelopment.1)HTML5introducesnewelementsandAPIsforrich,dynamicwebapplications.2)Itsupportsmultimediawithoutplugins,enhancinguserexperienceacrossdevices.3)SemanticelementsimprovecontentstructureandSEO.4)H5'srespo

    See all articles

    Hot AI Tools

    Undresser.AI Undress

    Undresser.AI Undress

    AI-powered app for creating realistic nude photos

    AI Clothes Remover

    AI Clothes Remover

    Online AI tool for removing clothes from photos.

    Undress AI Tool

    Undress AI Tool

    Undress images for free

    Clothoff.io

    Clothoff.io

    AI clothes remover

    Video Face Swap

    Video Face Swap

    Swap faces in any video effortlessly with our completely free AI face swap tool!

    Hot Tools

    SublimeText3 Linux new version

    SublimeText3 Linux new version

    SublimeText3 Linux latest version

    Dreamweaver Mac version

    Dreamweaver Mac version

    Visual web development tools

    ZendStudio 13.5.1 Mac

    ZendStudio 13.5.1 Mac

    Powerful PHP integrated development environment

    SecLists

    SecLists

    SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

    SublimeText3 Mac version

    SublimeText3 Mac version

    God-level code editing software (SublimeText3)