search
HomeBackend DevelopmentXML/RSS TutorialXML operation of PHP extension (2) - XML ​​parser installation and overview



1. Overview and installation

XML (eXtensible Markup Language) is a language used on the Internet. A data format for interacting with structured documents. It is a standard defined by the Internet Society (W3C). Information about XML and related technologies can be found at http://www.php.cn/.

This PHP extension implementation supports expat written in PHP by James Clark. This toolkit can parse (but not validate) XML documents. It supports 3 character encodings provided by PHP: US-ASCII, ISO-8859-1 and UTF-8. UTF-16 is not supported.

This extension creates an XML parser and defines handlers for different XML events. Each XML parser also has a handful of parameters that can be adjusted.

This extension requires the libxml PHP extension. This means that --enable-libxml needs to be used, although this will be done implicitly since libxml is enabled by default.

By default, this extension uses expat compat layer. You can also use expat. This library is located at http://www.php.cn/. Using the Makefile in the expat library will not build the library file by default. You can use the following build rules to build:


libexpat.a: $(OBJS)
    ar -rc $@ $(OBJS)
    ranlib $@

# The source code RPM installation package of ##expat can be found at http://www.php.cn/.

This extension is enabled by default and can be disabled through the following options when compiling:

--disable-xml

These functions are enabled by default and use the bundled expat library . You can disable XML support through the parameter

--disable-xml. If you compile PHP as a module with Apache 1.3.9 or higher, PHP will automatically use the expat library bundled with Apache. If you do not want to use the bundled expat library, please use the parameter --with-expat-dir=DIR when running PHP's configure configuration script, where DIR should point to the root directory of the expat installation.

The Windows version of PHP has built-in support for this extension. No additional extensions need to be loaded to use these functions.

2. Event processor

XML event processor is defined as follows:

PHP handler functionEvent description##xml_set_element_handler()xml_set_character_data_handler()xml_set_processing_instruction_handler()xml_set_default_handler()xml_set_unparsed_entity_decl_handler()xml_set_notation_decl_handler()##xml_set_external_entity_ref_handler()When This handler is called when the XML parser finds a reference to an external parsed ordinary entity. For example, refer to a file or URL. Examples can be found in the XML external entity routines.

3. Capitalization conversion

The element processing function can convert the element name into case-folded (capital letters) form. Case-folding is defined as "a string operation that replaces non-uppercase letters with their corresponding uppercase letters." In other words, in XML, case-folding is converting to uppercase.

By default, all element names passed through the processing function are converted to uppercase letters. Each XML parser can query and control this function through the xml_parser_get_option() and xml_parser_set_option() functions respectively.

4. Error codes

The following constants are XML-related error codes (return values ​​of the xml_parse() function):

  • XML_ERROR_NONE

  • XML_ERROR_NO_MEMORY

  • XML_ERROR_SYNTAX

  • XML_ERROR_NO_ELEMENTS

  • XML_ERROR_INVALID_TOKEN

  • XML_ERROR_UNCLOSED_TOKEN

  • XML_ERROR_PARTIAL_CHAR

  • XML_ERROR_TAG_MISMATCH

  • XML_ERROR_DUPLICATE_ATTRIBUTE

  • ##XML_ERROR_JUNK_AFTER_DOC_ELEMENT

  • ##XML_ERROR_PARAM_ENTITY_REF
  • ##XML_ERROR_RECURSIVE_ENTITY_REF
  • XML_ERROR_ASYNC_ENTITY
  • XML_ERROR_BAD_CHAR_REF
  • #XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF

  • ##XML_ERROR_MISPLACED_XML_PI
  • ##XML_ERROR_UNKNOWN_ENCODING
  • XML_ERROR_INCORRECT_ENCODING
  • XML_ERROR_UNCLOSED_CDATA_SECTION
  • XML_ERROR_EXTERNAL_ENTITY_HANDLING
  • ##5. Character encoding

  • PHP’s XML extension passes through several different The character encoding supports the Unicode character set. There are two types of character encodings, original encoding and target encoding. In PHP's internal presentation, documents are always encoded in UTF-8.
  • When the XML is parsed, the original encoding is completed. When creating an XML parser, you can specify the original encoding (this encoding cannot be modified later in the XML parser's life cycle). The supported raw encodings are ISO-8859-1, US-ASCII and UTF-8. The first two are single-byte encodings, that is, each character is represented as one byte. UTF-8 can encode characters into a series of variable numbers (up to 21) of bits, arranged into 1 to 4 bytes. The default raw encoding used in PHP is ISO-8859-1.

  • When PHP passes the data to the XML processing function, the target encoding is completed. When creating an XML processor, the target encoding is set to be the same as the original encoding, but can be modified at will. Target encoding affects character data and tag names, as well as processing instruction targets.
  • If the XML parser encounters a character outside the range that the original encoding can represent, it will return an error.

  • If PHP encounters a character in the parsed XML document that cannot be represented by the specified target encoding, the problem character will be "downgraded". Typically, those characters are replaced with question marks (?).
Supported XML processor
When XML is parsed When the browser encounters the opening or closing tag, the element event is triggered. Open tags and closing tags have different handlers.
The character data range refers to all non-tagged content in the XML document, including spaces between tags. Note that the XML parser does not add or remove any whitespace, it is up to the application (you) to determine whether the whitespace is meaningful.
PHP programmers must be proficient in processing instructions (PI). is a processing instruction, where php is called the "processing instruction object". Except for all processing instruction objects starting with "XML" which are reserved by the system, other processing functions are specified by the application program.
If other processing functions are not executed, the default processing function will be executed. Information such as XML and document type declarations are available in the default handler functions.
Unparsed entity declaration (NDATA) will call this handler function.
Symbol declaration will call this handler function
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Decoding RSS: The XML Structure of Content FeedsDecoding RSS: The XML Structure of Content FeedsApr 17, 2025 am 12:09 AM

The XML structure of RSS includes: 1. XML declaration and RSS version, 2. Channel (Channel), 3. Item. These parts form the basis of RSS files, allowing users to obtain and process content information by parsing XML data.

How to Parse and Utilize XML-Based RSS FeedsHow to Parse and Utilize XML-Based RSS FeedsApr 16, 2025 am 12:05 AM

RSSfeedsuseXMLtosyndicatecontent;parsingtheminvolvesloadingXML,navigatingitsstructure,andextractingdata.Applicationsincludebuildingnewsaggregatorsandtrackingpodcastepisodes.

RSS Documents: How They Deliver Your Favorite ContentRSS Documents: How They Deliver Your Favorite ContentApr 15, 2025 am 12:01 AM

RSS documents work by publishing content updates through XML files, and users subscribe and receive notifications through RSS readers. 1. Content publisher creates and updates RSS documents. 2. The RSS reader regularly accesses and parses XML files. 3. Users browse and read updated content. Example of usage: Subscribe to TechCrunch's RSS feed, just copy the link to the RSS reader.

Building Feeds with XML: A Hands-On Guide to RSSBuilding Feeds with XML: A Hands-On Guide to RSSApr 14, 2025 am 12:17 AM

The steps to build an RSSfeed using XML are as follows: 1. Create the root element and set the version; 2. Add the channel element and its basic information; 3. Add the entry element, including the title, link and description; 4. Convert the XML structure to a string and output it. With these steps, you can create a valid RSSfeed from scratch and enhance its functionality by adding additional elements such as release date and author information.

Creating RSS Documents: A Step-by-Step TutorialCreating RSS Documents: A Step-by-Step TutorialApr 13, 2025 am 12:10 AM

The steps to create an RSS document are as follows: 1. Write in XML format, with the root element, including the elements. 2. Add, etc. elements to describe channel information. 3. Add elements, each representing a content entry, including,,,,,,,,,,,. 4. Optionally add and elements to enrich the content. 5. Ensure the XML format is correct, use online tools to verify, optimize performance and keep content updated.

XML's Role in RSS: The Foundation of Syndicated ContentXML's Role in RSS: The Foundation of Syndicated ContentApr 12, 2025 am 12:17 AM

The core role of XML in RSS is to provide a standardized and flexible data format. 1. The structure and markup language characteristics of XML make it suitable for data exchange and storage. 2. RSS uses XML to create a standardized format to facilitate content sharing. 3. The application of XML in RSS includes elements that define feed content, such as title and release date. 4. Advantages include standardization and scalability, and challenges include document verbose and strict syntax requirements. 5. Best practices include validating XML validity, keeping it simple, using CDATA, and regularly updating.

From XML to Readable Content: Demystifying RSS FeedsFrom XML to Readable Content: Demystifying RSS FeedsApr 11, 2025 am 12:03 AM

RSSfeedsareXMLdocumentsusedforcontentaggregationanddistribution.Totransformthemintoreadablecontent:1)ParsetheXMLusinglibrarieslikefeedparserinPython.2)HandledifferentRSSversionsandpotentialparsingerrors.3)Transformthedataintouser-friendlyformatsliket

Is There an RSS Alternative Based on JSON?Is There an RSS Alternative Based on JSON?Apr 10, 2025 am 09:31 AM

JSONFeed is a JSON-based RSS alternative that has its advantages simplicity and ease of use. 1) JSONFeed uses JSON format, which is easy to generate and parse. 2) It supports dynamic generation and is suitable for modern web development. 3) Using JSONFeed can improve content management efficiency and user experience.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment