The same day ago, I was discussing with my colleagues the relationship between the encoding attribute in xml and the file format, and finally understood it thoroughly.
It was previously understood that the encoding definition in xml must match the file format. That is, if there is such an XML Introduction . (I later found out that FF FE is not the BOM of utf-8... which means that my misunderstanding lasted for quite a while...)
Let's briefly talk about the several stages of the discussion.
At the beginning of the discussion, I told him for sure that the encoding value must match the file format (ie BOM, BOM is the abbreviation of byte order mark), otherwise when parsing XML, errors may occur (for example, the document contains A certain UNICODE character, and the format specified by encoding or BOM does not match, an error will occur. This is what I meant at the time), and then he told me that it seemed not to be the case. The XML file I created with DELPHI did not have a BOM in the XML. There is Chinese content, and the encoding specified is UTF-8. It can be opened normally with IE.
When he discovered that the XML file he created did not have a BOM, an interesting thing was that when using UE to open such files containing UNICODE characters, UE will automatically add FF FE in front of the file so that the file can be displayed normally. , so if you browse a file that originally does not have a BOM in hexadecimal under UE, you will see an additional BOM. This function can be removed in the OPTIONS of UE. If you want to know, you can find it yourself.
Then I got a little confused, how could this happen, and then I thought and thought, and suddenly he sent a message with the following content:
W3C defines three pieces of XML Rules for how the parser correctly reads the encoding of XML files:
1. If the document has a BOM (Byte Order Mark, generally speaking, if it is saved in unicode format, it contains the BOM, but ANSI does not) , the file encoding is defined
2. If there is no BOM, check the encoding attribute of the XML declaration
3. If there are neither of the above, it is assumed that the XML document is encoded in UTF-8
With these three rules, this rule will be much clearer.
First, the XML parser parses the file according to the BOM of the file; if the BOM is not found, the encoding specified by the encoding attribute in XML is used; if the encoding is not specified in XML, utf-8 is used by default. Parse the document. Then it can be launched. If there are both BOM and ENCODING, the one specified by BOM shall prevail.
ah! Suddenly I felt how great it would be to have standard documents! Although it is so natural.
At this point, I finally understand the relationship between encoding and file format in xml. Although this record only contains a few hundred words, when we were discussing it, the total time spent was almost 2 hours.
The above is the detailed content of Detailed explanation of encoding in xml. For more information, please follow other related articles on the PHP Chinese website!

如何用PHP和XML实现网站的分页和导航导言:在开发一个网站时,分页和导航功能是很常见的需求。本文将介绍如何使用PHP和XML来实现网站的分页和导航功能。我们会先讨论分页的实现,然后再介绍导航的实现。一、分页的实现准备工作在开始实现分页之前,需要准备一个XML文件,用来存储网站的内容。XML文件的结构如下:<articles><art

一、XML外部实体注入XML外部实体注入漏洞也就是我们常说的XXE漏洞。XML作为一种使用较为广泛的数据传输格式,很多应用程序都包含有处理xml数据的代码,默认情况下,许多过时的或配置不当的XML处理器都会对外部实体进行引用。如果攻击者可以上传XML文档或者在XML文档中添加恶意内容,通过易受攻击的代码、依赖项或集成,就能够攻击包含缺陷的XML处理器。XXE漏洞的出现和开发语言无关,只要是应用程序中对xml数据做了解析,而这些数据又受用户控制,那么应用程序都可能受到XXE攻击。本篇文章以java

当我们处理数据时经常会遇到将XML格式转换为JSON格式的需求。PHP有许多内置函数可以帮助我们执行这个操作。在本文中,我们将讨论将XML格式转换为JSON格式的不同方法。

Pythonxmltodict对xml的操作xmltodict是另一个简易的库,它致力于将XML变得像JSON.下面是一个简单的示例XML文件:elementsmoreelementselementaswell这是第三方包,在处理前先用pip来安装pipinstallxmltodict可以像下面这样访问里面的元素,属性及值:importxmltodictwithopen("test.xml")asfd:#将XML文件装载到dict里面doc=xmltodict.parse(f

1.在Python中XML文件的编码问题1.Python使用的xml.etree.ElementTree库只支持解析和生成标准的UTF-8格式的编码2.常见GBK或GB2312等中文编码的XML文件,用以在老旧系统中保证XML对中文字符的记录能力3.XML文件开头有标识头,标识头指定了程序处理XML时应该使用的编码4.要修改编码,不仅要修改文件整体的编码,还要将标识头中encoding部分的值修改2.处理PythonXML文件的思路1.读取&解码:使用二进制模式读取XML文件,将文件变为

xml中node和element的区别是:Element是元素,是一个小范围的定义,是数据的组成部分之一,必须是包含完整信息的结点才是元素;而Node是节点,是相对于TREE数据结构而言的,一个结点不一定是一个元素,一个元素一定是一个结点。

使用nmap-converter将nmap扫描结果XML转化为XLS实战1、前言作为网络安全从业人员,有时候需要使用端口扫描利器nmap进行大批量端口扫描,但Nmap的输出结果为.nmap、.xml和.gnmap三种格式,还有夹杂很多不需要的信息,处理起来十分不方便,而将输出结果转换为Excel表格,方面处理后期输出。因此,有技术大牛分享了将nmap报告转换为XLS的Python脚本。2、nmap-converter1)项目地址:https://github.com/mrschyte/nmap-

Scrapy是一款强大的Python爬虫框架,可以帮助我们快速、灵活地获取互联网上的数据。在实际爬取过程中,我们会经常遇到HTML、XML、JSON等各种数据格式。在这篇文章中,我们将介绍如何使用Scrapy分别爬取这三种数据格式的方法。一、爬取HTML数据创建Scrapy项目首先,我们需要创建一个Scrapy项目。打开命令行,输入以下命令:scrapys


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver CS6
Visual web development tools

WebStorm Mac version
Useful JavaScript development tools

Notepad++7.3.1
Easy-to-use and free code editor

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Atom editor mac version download
The most popular open source editor
