Home > Article > Web Front-end > Detailed explanation of JavaScript character set encoding and decoding for you (graphic tutorial)
This article gives you a detailed explanation of the character set in JavaScript, as well as the encoding and decoding of character sets. It is very detailed. Friends in need can refer to it
1. Characters Set
1) Characters and Bytes (Character)
Character is the general term for various texts and symbols, including garbled characters; one character corresponds to 1~n Bytes, one byte corresponds to 8 bits, each bit is represented by 0 or 1.
2) Character Set
Character set is a collection of multiple characters. Each character set contains a different number of characters. Common character set names : ASCII character set, GB2312 character set, Unicode character set, etc.
3) Character Encoding
Character encoding is to convert symbols into computer-readable binary, and decoding is to convert binary into human-readable symbol.
Most character sets correspond to one encoding method (for example, GBK corresponds to GBK encoding), but there are many Unicode encodings, including UTF-8, UTF-16, UTF-32 and UTF-7.
The most commonly used web page is "UTF-8". UTF-8 uses one to four bytes to encode each character. It is a superset of ASCII, so existing ASCII text does not need to be converted
2. Browser system
1) Use decimal and hexadecimal in HTML attributes
Decimal in HTML "8" can be used in hexadecimal, and "Z" is used in hexadecimal. There is one more x than decimal, and there are also six more characters a~f in the hexadecimal code to represent 10~15.
2) Use decimal and hexadecimal in CSS attributes
CSS is compatible with the base format of HTML. In addition, hexadecimal can also be used. Expressed in the form of "\6c".
3) JavaScript encoding encapsulation
can directly execute string octal and hexadecimal encoding methods through eval, where octal is represented by "\56", Hexadecimal is represented by "\x5c".
If Chinese characters are used in the code and hexadecimal encoding is required, only hexadecimal Unicode encoding can be performed, and its representation is: "\u4ee3\u7801".
In "Web Front-end Hacking Technology Revealed", two methods are encapsulated for encoding and decoding. The following two methods are mainly used. The specific code can be viewed here.
The core codes are: "str.charCodeAt(char).toString(base)" and "String.fromCharCode(parseInt(code, base))"
charCodeAt() method returns An integer between 0 and 65535 representing the UTF-16 code unit at the given index
staticString.fromCharCode() method returns a string created using the specified sequence of Unicode values.
You can also encode and decode "MonyerJS" through an online web page.
4) HTML automatic decoding mechanism
For example, if you enter the hexadecimal "Hello" in the web page, it will automatically be decoded into "hello".
There are also some well-known spaces " " that are also this mechanism.
3. Browser encoding
There are three pairs of functions in JavaScript that can encode and decode strings, namely:
escape/unescape, encodeURI /decodeURI, encodeURIComponent/decodeURIComponent.
The main difference is the number of characters that are not encoded.
1) There are 69 characters that are not encoded by escape
*, , -, ., /, @, _, 0~9, a~ z, A~Z
and escape outputs %u**** format when encoding unicode values other than 0~255.
2) There are 82 characters that encodeURI does not encode
!, #, $, &, ', (,), *, ,,, -,.,/,:,;,=,?,@,_,~,0~9,a~z,A~Z
3) Characters that encodeURIComponent does not encode There are 71
##!,',(,),*,-,.,_,~,0~9,a~z,A~Z
Detailed explanation of the use of JS prototype and prototype chain
Detailed explanation of Servlet3.0 and JS through Ajax interaction examples
p5.jsKeyboard interaction function summary
The above is the detailed content of Detailed explanation of JavaScript character set encoding and decoding for you (graphic tutorial). For more information, please follow other related articles on the PHP Chinese website!