Home  >  Article  >  Backend Development  >  Solutions to style confusion caused by UTF-8 BOM

Solutions to style confusion caused by UTF-8 BOM

WBOY
WBOYOriginal
2016-07-25 09:05:28997browse
UTF-8 is a Unicode character encoding method often used in web applications. The advantage of using UTF-8 is that it is a variable-length encoding method. For ANSII code, the encoding length is 1 byte, so This can save a lot of network bandwidth when transmitting a large number of web pages with ASCII character sets.

utf-8 is a unicode character encoding method often used in web applications. The advantage of using utf-8 is that it is a variable-length encoding method. For ANSII code, the encoding length is 1 byte , in this case, a large amount of network bandwidth can be saved when transmitting a large number of web pages with ASCII character sets.

When using UTF-8 encoding to write web pages, some unknown blank lines or garbled characters often appear in the web pages due to BOM (Byte Order Mark) problems. This is because the utf-8 encoding is not mandatory for bom. Therefore, UTF-8 encoding will have different processing methods when saving files. For example, some browsers (FireFox) can automatically filter out all utf-8 bom, and some (IE) can only filter out one bom (why once? You will encounter this problem when you include multiple files) .

Use editplus or other editor to delete the BOM signature in the file, refresh the page, and the style will be normal.

Instructions about BOM:

There is a character called "ZERO WIDTH NO-BREAK SPACE" in UCS encoding, and its encoding is FEFF. FFFE is a character that does not exist in UCS, so it should not appear in actual transmission. The UCS specification recommends that we transmit the characters "ZERO WIDTH NO-BREAK SPACE" before transmitting the byte stream. In this way, if the receiver receives FEFF, it indicates that the byte stream is Big-Endian; if it receives FFFE, it indicates that the byte stream is Little-Endian. Therefore the character "ZERO WIDTH NO-BREAK SPACE" is also called BOM.

UTF-8 does not require a BOM to indicate the byte order, but can use the BOM to indicate the encoding method. The UTF-8 encoding of the character "ZERO WIDTH NO-BREAK SPACE" is EF BB BF. So if the receiver receives a byte stream starting with EF BB BF, it knows that it is UTF-8 encoded.

Windows uses BOM to mark the encoding method of text files.

In UTF-8 encoded files, BOM occupies three bytes. If you use Notepad to save a text file as UTF-8 encoding, open the file with UE and switch to hexadecimal editing mode, you can see the FFFE at the beginning. This is a good way to identify UTF-8 encoded files. The software uses BOM to identify whether the file is UTF-8 encoded. Many software also require that the read file must have BOM. However, there are still many software that cannot recognize BOM. When I was studying Firefox, I knew that in early versions of Firefox, extensions could not have BOMs, but versions after Firefox 1.5 have begun to support BOMs. Now I discovered that PHP does not support BOM either.

PHP did not consider the BOM issue when designing. It will not ignore the three characters of the BOM at the beginning of the UTF-8 encoded file. Since the code after or

Foreign English plug-ins and templates generally use ASCII encoding and will not have a BOM. Only domestic plug-ins and templates will cause problems due to the author's ignorance.

In addition, when modifying the template, since the output page uses UTF-8 encoding, if Chinese characters are added when modifying the template, the file must be converted to UTF-8 encoding for normal display. At this time, if the editor used automatically If the BOM is added, these three characters will be output on the page. The display effect depends on the browser. It is usually a blank line or a garbled code. Articles you may be interested in: php example: detect and clear BOM information at the beginning of the file Php implementation code for batch removal of BOM header information Sharing code for removing BOM in php A simple example of PHP filtering BOM data in the page Detect whether the php file has BOM header code How to batch clear BOM in php files Check and clear the BOM function in the php file Analysis of the difference between BOM and DOM About the detection and deletion of BOM in UTF-8 encoding



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn