Home >Backend Development >XML/RSS Tutorial >Detailed introduction to encoding and verification issues in XML code writing
This article mainly introduces the encoding and verification issues of XML code writing. Similar to HTML, the encoding of XML files can also be specified in the preamble. Friends in need can refer to
Encoding
Encoding is the process of converting Unicode characters into equivalent binary representations. When an XML handler reads an XML document, it relies on the encoding type to encode the document. Therefore, we need to specify the encoding type in the XML declaration.
Encoding type
There are two main types of encoding:
UTF-8
UTF-16
UTF represents UCS conversion format, and UCS itself means universal character set. Number 8 or 16 represents the number of bits to represent the character. They are 8 (one byte) or 16 (two bytes). For documents without encoding information, UTF-8 is used by default.
Syntax
Encoding information is contained in the prologue of the XML document. The UTF-8 encoding syntax is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
UTF-16 encoding syntax is as follows:
<?xml version="1.0" encoding="UTF-16" standalone="no" ?>
Example
The following The example shows the encoding declaration:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>Tanmay Patil TutorialsPoint (011) 123-4567
In the encoding="UTF-8" example above, it is specified that 8 bits are used to represent the characters. To use 16 characters, use UTF-16 encoding.
XML files encoded using UTF-8 are smaller in size than files in UTF-16 format.
Validation
Validation is the process of validating XML documents. A document is considered valid if its content matches the elements, attributes, and associated document type definition (DTD), and if the document conforms to the constraints expressed by b. There are two ways to handle validation through an XML parser. They are:
Well-formed XML document
Valid XML document
Well-formed XML document
An XML document is considered well-formed if it follows the following rules.
XML documents without DTD must use predefined character entities to handle amp(&), apos (single quote), g(>), quot (double quote).
Must follow the order of tags, for example, the inner tag must be closed before the outer tag is closed.
Each start tag must have an end tag or must be a self-closing tag (b2386ffb911b14667cb8f0f91ea547a7...6e916e0f7d1e588d4f442bf645aedb2f or d4bce3f20dbe5ab08417432e520da517).
There must be only one attribute in the start tag, and it needs to be wrapped in quotation marks.
Except for amp(&), apos (single quotation mark), g(>), quot (double quotation mark) entities, others must be declared before use.
Example
The following is an example of a well-formed XML document:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE address [ <!ELEMENT address (name,company,phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT company (#PCDATA)> <!ELEMENT phone (#PCDATA)> ]> <address> <name>Tanmay Patil</name> <company>TutorialsPoint</company> <phone>(011) 123-4567</phone> </address>
The above example is considered well-formed because:
It defines the document type. And here document type is element type.
Contains a root element named address.
Each child element name, company and phone is a self-explanatory correctly closed tag. The
tags are in the correct order.
The above is the detailed content of Detailed introduction to encoding and verification issues in XML code writing. For more information, please follow other related articles on the PHP Chinese website!