search
HomeBackend DevelopmentXML/RSS TutorialXML operation of PHP extension (2) - XML ​​parser installation and overview



1. Overview and installation

XML (eXtensible Markup Language) is a language used on the Internet. A data format for interacting with structured documents. It is a standard defined by the Internet Society (W3C). Information about XML and related technologies can be found at http://www.php.cn/.

This PHP extension implementation supports expat written in PHP by James Clark. This toolkit can parse (but not validate) XML documents. It supports 3 character encodings provided by PHP: US-ASCII, ISO-8859-1 and UTF-8. UTF-16 is not supported.

This extension creates an XML parser and defines handlers for different XML events. Each XML parser also has a handful of parameters that can be adjusted.

This extension requires the libxml PHP extension. This means that --enable-libxml needs to be used, although this will be done implicitly since libxml is enabled by default.

By default, this extension uses expat compat layer. You can also use expat. This library is located at http://www.php.cn/. Using the Makefile in the expat library will not build the library file by default. You can use the following build rules to build:


libexpat.a: $(OBJS)
    ar -rc $@ $(OBJS)
    ranlib $@

# The source code RPM installation package of ##expat can be found at http://www.php.cn/.

This extension is enabled by default and can be disabled through the following options when compiling:

--disable-xml

These functions are enabled by default and use the bundled expat library . You can disable XML support through the parameter

--disable-xml. If you compile PHP as a module with Apache 1.3.9 or higher, PHP will automatically use the expat library bundled with Apache. If you do not want to use the bundled expat library, please use the parameter --with-expat-dir=DIR when running PHP's configure configuration script, where DIR should point to the root directory of the expat installation.

The Windows version of PHP has built-in support for this extension. No additional extensions need to be loaded to use these functions.

2. Event processor

XML event processor is defined as follows:

PHP handler functionEvent description##xml_set_element_handler()xml_set_character_data_handler()xml_set_processing_instruction_handler()xml_set_default_handler()xml_set_unparsed_entity_decl_handler()xml_set_notation_decl_handler()##xml_set_external_entity_ref_handler()When This handler is called when the XML parser finds a reference to an external parsed ordinary entity. For example, refer to a file or URL. Examples can be found in the XML external entity routines.

3. Capitalization conversion

The element processing function can convert the element name into case-folded (capital letters) form. Case-folding is defined as "a string operation that replaces non-uppercase letters with their corresponding uppercase letters." In other words, in XML, case-folding is converting to uppercase.

By default, all element names passed through the processing function are converted to uppercase letters. Each XML parser can query and control this function through the xml_parser_get_option() and xml_parser_set_option() functions respectively.

4. Error codes

The following constants are XML-related error codes (return values ​​of the xml_parse() function):

  • XML_ERROR_NONE

  • XML_ERROR_NO_MEMORY

  • XML_ERROR_SYNTAX

  • XML_ERROR_NO_ELEMENTS

  • XML_ERROR_INVALID_TOKEN

  • XML_ERROR_UNCLOSED_TOKEN

  • XML_ERROR_PARTIAL_CHAR

  • XML_ERROR_TAG_MISMATCH

  • XML_ERROR_DUPLICATE_ATTRIBUTE

  • ##XML_ERROR_JUNK_AFTER_DOC_ELEMENT

  • ##XML_ERROR_PARAM_ENTITY_REF
  • ##XML_ERROR_RECURSIVE_ENTITY_REF
  • XML_ERROR_ASYNC_ENTITY
  • XML_ERROR_BAD_CHAR_REF
  • #XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF

  • ##XML_ERROR_MISPLACED_XML_PI
  • ##XML_ERROR_UNKNOWN_ENCODING
  • XML_ERROR_INCORRECT_ENCODING
  • XML_ERROR_UNCLOSED_CDATA_SECTION
  • XML_ERROR_EXTERNAL_ENTITY_HANDLING
  • ##5. Character encoding

  • PHP’s XML extension passes through several different The character encoding supports the Unicode character set. There are two types of character encodings, original encoding and target encoding. In PHP's internal presentation, documents are always encoded in UTF-8.
  • When the XML is parsed, the original encoding is completed. When creating an XML parser, you can specify the original encoding (this encoding cannot be modified later in the XML parser's life cycle). The supported raw encodings are ISO-8859-1, US-ASCII and UTF-8. The first two are single-byte encodings, that is, each character is represented as one byte. UTF-8 can encode characters into a series of variable numbers (up to 21) of bits, arranged into 1 to 4 bytes. The default raw encoding used in PHP is ISO-8859-1.

  • When PHP passes the data to the XML processing function, the target encoding is completed. When creating an XML processor, the target encoding is set to be the same as the original encoding, but can be modified at will. Target encoding affects character data and tag names, as well as processing instruction targets.
  • If the XML parser encounters a character outside the range that the original encoding can represent, it will return an error.

  • If PHP encounters a character in the parsed XML document that cannot be represented by the specified target encoding, the problem character will be "downgraded". Typically, those characters are replaced with question marks (?).
Supported XML processor
When XML is parsed When the browser encounters the opening or closing tag, the element event is triggered. Open tags and closing tags have different handlers.
The character data range refers to all non-tagged content in the XML document, including spaces between tags. Note that the XML parser does not add or remove any whitespace, it is up to the application (you) to determine whether the whitespace is meaningful.
PHP programmers must be proficient in processing instructions (PI). is a processing instruction, where php is called the "processing instruction object". Except for all processing instruction objects starting with "XML" which are reserved by the system, other processing functions are specified by the application program.
If other processing functions are not executed, the default processing function will be executed. Information such as XML and document type declarations are available in the default handler functions.
Unparsed entity declaration (NDATA) will call this handler function.
Symbol declaration will call this handler function
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
RSS, XML and the Modern Web: A Content Syndication Deep DiveRSS, XML and the Modern Web: A Content Syndication Deep DiveMay 08, 2025 am 12:14 AM

RSS and XML are still important in the modern web. 1.RSS is used to publish and distribute content, and users can subscribe and get updates through the RSS reader. 2. XML is a markup language and supports data storage and exchange, and RSS files are based on XML.

Beyond Basics: Advanced RSS Features Enabled by XMLBeyond Basics: Advanced RSS Features Enabled by XMLMay 07, 2025 am 12:12 AM

RSS enables multimedia content embedding, conditional subscription, and performance and security optimization. 1) Embed multimedia content such as audio and video through tags. 2) Use XML namespace to implement conditional subscriptions, allowing subscribers to filter content based on specific conditions. 3) Optimize the performance and security of RSSFeed through CDATA section and XMLSchema to ensure stability and compliance with standards.

Decoding RSS: An XML Primer for Web DevelopersDecoding RSS: An XML Primer for Web DevelopersMay 06, 2025 am 12:05 AM

RSS is an XML-based format used to publish frequently updated data. As a web developer, understanding RSS can improve content aggregation and automation update capabilities. By learning RSS structure, parsing and generation methods, you will be able to handle RSSfeeds confidently and optimize your web development skills.

JSON vs. XML: Why RSS Chose XMLJSON vs. XML: Why RSS Chose XMLMay 05, 2025 am 12:01 AM

RSS chose XML instead of JSON because: 1) XML's structure and verification capabilities are better than JSON, which is suitable for the needs of RSS complex data structures; 2) XML was supported extensively at that time; 3) Early versions of RSS were based on XML and have become a standard.

RSS: The XML-Based Format ExplainedRSS: The XML-Based Format ExplainedMay 04, 2025 am 12:05 AM

RSS is an XML-based format used to subscribe and read frequently updated content. Its working principle includes two parts: generation and consumption, and using an RSS reader can efficiently obtain information.

Inside the RSS Document: Essential XML Tags and AttributesInside the RSS Document: Essential XML Tags and AttributesMay 03, 2025 am 12:12 AM

The core structure of RSS documents includes XML tags and attributes. The specific parsing and generation steps are as follows: 1. Read XML files, process and tags. 2. Extract,,, etc. tag information. 3. Handle custom tags and attributes to ensure version compatibility. 4. Use cache and asynchronous processing to optimize performance to ensure code readability.

JSON, XML, and Data Formats: Comparing RSSJSON, XML, and Data Formats: Comparing RSSMay 02, 2025 am 12:20 AM

The main differences between JSON, XML and RSS are structure and uses: 1. JSON is suitable for simple data exchange, with a simple structure and easy to parse; 2. XML is suitable for complex data structures, with a rigorous structure but complex parsing; 3. RSS is based on XML and is used for content release, standardized but limited use.

Troubleshooting XML/RSS Feeds: Common Pitfalls and Expert SolutionsTroubleshooting XML/RSS Feeds: Common Pitfalls and Expert SolutionsMay 01, 2025 am 12:07 AM

The processing of XML/RSS feeds involves parsing and optimization, and common problems include format errors, encoding issues, and missing elements. Solutions include: 1. Use XML verification tools to check for format errors; 2. Ensure encoding consistency and use the chardet library to detect encoding; 3. Use default values ​​or skip the element when missing elements; 4. Use efficient parsers such as lxml and cache parsing results to optimize performance; 5. Pay attention to data consistency and security to prevent XML injection attacks.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

Atom editor mac version download

Atom editor mac version download

The most popular open source editor