HTML5標準學習-編碼詳解-H5教程-PHP中文網

首頁

web前端

H5教程

HTML5標準學習-編碼詳解

黄舟

Mar 21, 2017 pm 03:14 PM

相信每個前端工程師都或多或少遇上過「亂碼」這位仁兄，無論你的基礎有多麼紮實，在生產的過程中都免不了偶爾和「亂碼」兄弟喝上幾杯茶吧。身為一個前端工程師，你是如何指定一個頁面的編碼的呢？你知道瀏覽器是怎麼辨識編碼的嗎？

首先，一個很簡單的例子，用遇簡的HTML頁面來看看各瀏覽器下方有什麼不同：

<!DOCTYPE html>

最簡HTML，和都沒有內容，伺服器也不給予具體的編碼聲明，直接從本地打開，各個瀏覽器下查看頁面的編碼：

##IE9GB2312系統預設#Firefox3.5GBK2312系統預設字元集Firefox4.0ISO-8859-1#西歐語言，英文預設編碼GBK系統預設字元集Opera中文-自動偵測應該也是GB2312

It can be seen from the Table that each browser has different parsing for pages that do not use any means to declare encoding. Of course, in the simplest page, no matter what encoding is used (of course, the premise is a superset of ASCII), it has no impact, but it is enough to show the importance of setting the encoding correctly.

Encoding Statement

HTML4 and HTML5 each adopt a chapter to explain the encoding statement method. You can click here to view the relevant chapters of HTML4 or click here to view the relevant chapters of HTML5. chapter.

First of all, what is coding? Encoding is to specify the browser (or user agent) to use a special algorithm to parse the byte stream in a certain way to obtain the truly correct content. In the HTML standard, encodings can be represented using aliases. Encoding aliases come from the IANA definition, and only encodings that appear in this list can be recognized by browsers. Therefore, if UTF-8 is written as UTF8, the browser may completely ignore it. In addition, encoding aliases are case-insensitive.

In HTML4, there are three methods to specify the encoding of the page. According to the priority, they are:

The Content-Type field in the HTTP header is followed by characters set.
Use the <meta http-equiv="Content-Type"> tag to declare.
For some external resources, such as js files loaded by the <script></script> tag, they can be declared through the charset attribute on the tag.

Of course there is no doubt about this. It should be noted that if the page is declared through the <meta http-equiv="Content-Type"> tag, When the browser encounters this tag, if it finds that the encoding it uses does not match the tag declaration, it will go back to the beginning and re-parse the page. This will cause part of the page to be re-parsed, so if you are trying to use a tag to declare the encoding, it is recommended to write the tag as early as possible. A best practice is to write it after the tag and before any other tags. Regarding this point, Google PageSpeed also has a corresponding introduction.

Evolution of the Times

But as time went by, developers gradually discovered one thing. Just like the simplest statement of DOCTYPE, in fact, when the browser reads the encoding of the <meta> tag, it does not strictly follow the standard. All in all, since in the HTML parsing stage, the encoding of the page must be determined before the Tokenizer stage, it is impossible for the browser to decompose it when the DOM tree is built like analyzing the DOM tree<meta>The structure of the tag, take out the http-equiv and content attributes, and then determine the encoding.

In reality, the browser does a very simple thing to read the encoding defined by the <meta> tag:

## Make sure this is a
<meta> tag. According to the status machine of HTML parsing, the "string is Can be sure.
Look for the string (there is no concept of label here, just a string) and find a substring "charset".
Read backward, ignore all space characters, and find the first meaningful character c.

If c is not the character "=", return to step 2 and continue searching.
If c is the character "=", continue going down.

Then skip all space characters, single quotes, double quotes, etc., and scan backwards until you encounter single quotes, double quotes, space characters, end tags, etc. The characters that should appear are above, and the string s scanned therein is intercepted.

Analyze s and get the encoding alias.

From the above algorithm, it is not difficult to find that the following writing methods can actually allow the browser to correctly identify the encoding:

< ;meta charset="utf-8" />
##<meta charset="utf-8">

<meta charset=utf-8>

当然这是HTML的语法，如果遵从XHTML并觉得XHTML更加亲切地话，写成<meta charset="utf-8">也是没问题的。

而前文所述的具体获取编码的算法也被详细地记录在案，可以在这里看到。

到了HTML5时代，标准再次对编码的声明方式做了修正和细化，总得来说有以下的区别：

HTML5允许使用BOM来决定编码，但仅支持UTF-16的BOM（即U+FEFF），且没有说明BOM指定编码的优先级如何。
HTML5添加了meta charset标签。
HTML5规定如果一个页面没有指定编码，则使用ASCII作为其编码，而HTML4则规定浏览器可以根据所处的环境自行选择。

其他杂项

除了编码的基本声明方式外，标准中还有不少需要注意的细节：

如果使用<meta>标签声明编码的话，该编码只能是ASCII的超集编码。可以简单地认为ASCII超集就是支持ASCII的256个字符的编码。
HTML5非常推荐使用UTF-8编码。
标准中提出不要使用UTF-32、JIS_C6226-1983、JIS_X0212-1990、HZ-GB-2312、JOHAB等字符集，并禁止使用CESU-8、UTF-7、BOCU-1和SCSU字符集。但事实上浏览器却至少能识别UTF-7。
对于想要严格遵守XHTML的开发者，应当使用XML声明来指定编码，即<?xml version="1.0" encoding="UTF-8" standalone="no" ?>。但是这个在IE6下会影响到DOCTYPE，所以开发者也不得在这一点上给予妥协，乖乖地去用HTML的声明方式。
关于现实中各编码声明方式的优先级，以及一些其他需要注意的细节，这篇文章值得一读。

最佳实践

尽可能使用HTTP头指定编码。
尽可能使用UTF-8，或者至少全站所有资源使用统一编码。
如果想使用UTF-16，就给文件加上BOM，以确定是Little Endian还是Big Endian的。
如果使用<meta>标签指定编码，可以不使用http-equiv的形式，但尽可能让标签出现在前面，至少保证在任何非ASCII字符之前。
链接外部的脚本，如果无法确定编码相同的话，加上charset属性。

瀏覽器	顯示編碼	備註
#IE6	UTF- 8
IE8	#UTF-8
		字元集


#Chrome

以上是HTML5標準學習-編碼詳解的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

H5：如何增強網絡上的用戶體驗Apr 19, 2025 am 12:08 AM

H5通過多媒體支持、離線存儲和性能優化提升網頁用戶體驗。 1）多媒體支持：H5的和元素簡化開發，提升用戶體驗。 2）離線存儲：WebStorage和IndexedDB允許離線使用，提升體驗。 3）性能優化：WebWorkers和元素優化性能，減少帶寬消耗。

解構H5代碼：標籤，元素和屬性Apr 18, 2025 am 12:06 AM

HTML5代碼由標籤、元素和屬性組成：1.標籤定義內容類型，用尖括號包圍，如。 2.元素由開始標籤、內容和結束標籤組成，如內容。 3.屬性在開始標籤中定義鍵值對，增強功能，如。這些是構建網頁結構的基本單位。

了解H5代碼：HTML5的基本原理Apr 17, 2025 am 12:08 AM

HTML5是構建現代網頁的關鍵技術，提供了許多新元素和功能。 1.HTML5引入了語義化元素如、、等，增強了網頁結構和SEO。 2.支持多媒體元素和，無需插件即可嵌入媒體。 3.表單增強了新輸入類型和驗證屬性，簡化了驗證過程。 4.提供了離線和本地存儲功能，提升了網頁性能和用戶體驗。

H5代碼：Web開發人員的最佳實踐Apr 16, 2025 am 12:14 AM

H5代碼的最佳實踐包括：1.使用正確的DOCTYPE聲明和字符編碼；2.採用語義化標籤；3.減少HTTP請求；4.使用異步加載；5.優化圖像。這些實踐能提升網頁的效率、可維護性和用戶體驗。

H5：網絡標準和技術的發展Apr 15, 2025 am 12:12 AM

Web标准和技术从HTML4、CSS2和简单的JavaScript演变至今，经历了显著的发展。1)HTML5引入了Canvas、WebStorage等API，增强了Web应用的复杂性和互动性。2)CSS3增加了动画和过渡功能，使页面效果更加丰富。3)JavaScript通过Node.js和ES6的现代化语法，如箭头函数和类，提升了开发效率和代码可读性，这些变化推动了Web应用的性能优化和最佳实践的发展。

H5是HTML5的速記嗎？探索細節Apr 14, 2025 am 12:05 AM

H5不僅僅是HTML5的簡稱，它代表了一個更廣泛的現代網頁開發技術生態：1.H5包括HTML5、CSS3、JavaScript及相關API和技術；2.它提供更豐富、互動、流暢的用戶體驗，能在多設備上無縫運行；3.使用H5技術棧可以創建響應式網頁和復雜交互功能。

H5和HTML5：網絡開發中常用的術語Apr 13, 2025 am 12:01 AM

H5與HTML5指的是同一個東西，即HTML5。 HTML5是HTML的第五個版本，帶來了語義化標籤、多媒體支持、畫布與圖形、離線存儲與本地存儲等新功能，提升了網頁的表現力和交互性。

H5指的是什麼？探索上下文Apr 12, 2025 am 12:03 AM

H5referstoHTML5,apivotaltechnologyinwebdevelopment.1)HTML5introducesnewelementsandAPIsforrich,dynamicwebapplications.2)Itsupportsmultimediawithoutplugins,enhancinguserexperienceacrossdevices.3)SemanticelementsimprovecontentstructureandSEO.4)H5'srespo

See all articles