search
HomeWeb Front-endHTML TutorialDetailed explanation of language encoding of charset in html

Pay attention to the importance of HTML language encoding

  • Directory


  1. ##Importance of charset encoding

  2. Where is charset in html

  3. charset tag

  4. Encoding type

  5. charset utf-8 introduction

  6. Introduction to charset GB2312

  7. Recommended web page encoding

  8. Web page compatibility due to encoding

1. The importance of encoding -

TOP

Encoding can cause garbled web pages when viewers use IE, and can also cause p+css compatibility Hack.

2. Encoding position -

TOP

3. HTML encoding style -

TOP
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

By changing The utf-8 in charset=utf-8 can change the encoding of the web page.
Generally when we write a CSS file, we also need to use
@charset "utf-8"; at the top of the CSS file to define the encoding type of this CSS file. Generally, the HTML source code and CSS file encoding must be unified. If they are not unified, it will lead to compatibility issues such as CSS hacks, garbled pages, and chaotic page layout.

4. Commonly used html encoding types -

TOP

The two popular ones commonly used in China are utf-8 and gb2312. Generally, these two types can meet domestic web page encoding needs. Of course, these two encoding types are also used in programs and databases to process web pages and store data types.

5. UTF-8 has the following characteristics: -

TOP

    ##UCS characters U+0000 to U+007F (ASCII) are encoded as Bytes 0x00 to 0x7F (ASCII compatible). This means that files containing only 7-bit ASCII characters are the same in both ASCII and UTF-8 encodings.
  1. All> The UCS character ;U+007F is encoded as a string of multiple bytes, each with a set of flag bits. Therefore, the ASCII bytes (0x00-0x7F) cannot be part of any other character.
  2. The first byte of a multi-byte string representing a non-ASCII character is always in the range 0xC0 to 0xFD, and indicates how many bytes this character contains. The rest of the multi-byte string Bytes are all in the range 0x80 to 0xBF. This makes resynchronization very easy, and makes the encoding borderless and rarely affected by missing bytes.
  3. Can encode all possible The 231 UCS codes
  4. UTF-8 encoded characters can theoretically be up to 6 bytes long, but 16-bit BMP characters can only be up to 3 bytes long.
  5. Bigendian UCS-4 byte strings are arranged in a predetermined order.
  6. Bytes 0xFE and 0xFF are never used in UTF-8 encoding .
  7. 6. GB2312 has the following characteristics -
TOP

GB2312 standard contains a total of 6763 Chinese characters, including 3755 first-level Chinese characters and second-level Chinese characters. There are 3008 Chinese characters; at the same time, GB2312 includes 682 full-width characters including Latin letters, Greek letters, Japanese hiragana and katakana letters, and Russian Cyrillic letters.

The emergence of GB2312 basically meets the computer processing needs of Chinese characters. The Chinese characters it contains have covered 99.75% of the frequency of use. In GB2312, the collected Chinese characters are "partitioned", and each zone contains 94 Chinese characters/symbols. This representation is also called location code.

01-09 area contains special symbols.

Areas 16-55 are first-level Chinese characters, sorted by pinyin.

Areas 56-87 are second-level Chinese characters, sorted by radical/stroke.

Districts 10-15 and 88-94 are not coded.

For example, the character "ah" is the first Chinese character in GB2312, and its location code is 1601. In programs using GB2312, the byte structure usually uses the EUC storage method so that Compatible with ASCII. Each Chinese character and symbol is represented by two bytes. The first byte is called the "high byte" and the second byte is called the "low byte". The "high byte" uses 0xA1-0xF7 (add 0xA0 to the area code of area 01-87), and the "low byte" uses 0xA1-0xFE (add 01-94 to 0xA0). For example The word "ah" is stored as 0xB0A1 in most programs. (Compare with location code: 0xB0=0xA0+16, 0xA1=0xA0+1).

So the decimal system of the Chinese character area code in GB2312 encoding is from 176 to 247, and the bit code is from 161 to 255. The reason why the stored 6763 is less than 82*94=6768 is because the area code is 215, and the bit code is from 161 to 255. There are five codes between 250 and 254 without Chinese character coding, so 6768-5=6763.

GB2312 encoding can be understood as a common language in China.

7. Recommended charset encoding -

TOP

##UTF-8 can be easily understood. Simplified and Traditional Chinese can use this encoding. For example, Taiwan and Mainland China use this encoding. .

8. Web page compatibility errors caused by encoding: - TOP

#If the encoding is mixed, the web page will be garbled, which is also called incompatible, especially if encoding mixing is used in CSS comments row will result in css hack.

I hope you will never forget to declare the web page encoding when making web pages in the future.

Users who have viewed this page have also viewed the following content:
1. The differences and relationships between UTF-8 GBK UTF8 GB2312
2. How to choose html encoding
3. html encoding settings

The above is the detailed content of Detailed explanation of language encoding of charset in html. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
HTML超文本标记语言--超在那里?(文档分析)HTML超文本标记语言--超在那里?(文档分析)Aug 02, 2022 pm 06:04 PM

本篇文章带大家了解一下HTML(超文本标记语言),介绍一下HTML的本质,HTML文档的结构、HTML文档的基本标签和图像标签、列表、表格标签、媒体元素、表单,希望对大家有所帮助!

html和css算编程语言吗html和css算编程语言吗Sep 21, 2022 pm 04:09 PM

不算。html是一种用来告知浏览器如何组织页面的标记语言,而CSS是一种用来表现HTML或XML等文件样式的样式设计语言;html和css不具备很强的逻辑性和流程控制功能,缺乏灵活性,且html和css不能按照人类的设计对一件工作进行重复的循环,直至得到让人类满意的答案。

web前端笔试题库之HTML篇web前端笔试题库之HTML篇Apr 21, 2022 am 11:56 AM

总结了一些web前端面试(笔试)题分享给大家,本篇文章就先给大家分享HTML部分的笔试题(附答案),大家可以自己做做,看看能答对几个!

总结HTML中a标签的使用方法及跳转方式总结HTML中a标签的使用方法及跳转方式Aug 05, 2022 am 09:18 AM

本文给大家总结介绍a标签使用方法和跳转方式,希望对大家有所帮助!

HTML5中画布标签是什么HTML5中画布标签是什么May 18, 2022 pm 04:55 PM

HTML5中画布标签是“<canvas>”。canvas标签用于图形的绘制,它只是一个矩形的图形容器,绘制图形必须通过脚本(通常是JavaScript)来完成;开发者可利用多种js方法来在canvas中绘制路径、盒、圆、字符以及添加图像等。

html中document是什么html中document是什么Jun 17, 2022 pm 04:18 PM

在html中,document是文档对象的意思,代表浏览器窗口的文档;document对象是window对象的子对象,所以可通过“window.document”属性对其进行访问,每个载入浏览器的HTML文档都会成为Document对象。

html5废弃了哪个列表标签html5废弃了哪个列表标签Jun 01, 2022 pm 06:32 PM

html5废弃了dir列表标签。dir标签被用来定义目录列表,一般和li标签配合使用,在dir标签对中通过li标签来设置列表项,语法“<dir><li>列表项值</li>...</dir>”。HTML5已经不支持dir,可使用ul标签取代。

Html5怎么取消td边框Html5怎么取消td边框May 18, 2022 pm 06:57 PM

3种取消方法:1、给td元素添加“border:none”无边框样式即可,语法“td{border:none}”。2、给td元素添加“border:0”样式,语法“td{border:0;}”,将td边框的宽度设置为0即可。3、给td元素添加“border:transparent”样式,语法“td{border:transparent;}”,将td边框的颜色设置为透明即可。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software