search

PHP DOM: Using XPath

Core points

  • XPath is a syntax for querying XML documents that provides a simpler and cleaner way to write functionality and reduces the amount of code required to write queries and filter XML data.
  • XPath query can be performed using two functions: query() and evaluate(). Although both perform queries, the difference is that the type of result they return, query() returns DOMNodeList, while evaluate() returns typed results as much as possible.
  • Using XPath can make the code more concise and efficient. In the comparison test, the speed advantage of using pure XPath is quite obvious, with the XPath version about 10% faster than the non-XPath version.
  • PHP DOM allows you to extend standard XPath functions with custom functions. This includes integrating PHP's own functions into XPath queries and registering PHP functions used in XPath. This extends the functionality of XPath to enable it to perform more complex queries.

This article will explore XPath in depth, including its features and how it is implemented in PHP. You will find that XPath can greatly reduce the amount of code required to write queries and filter XML data, and generally improve performance. I will demonstrate the PHP DOM XPath functionality using the same DTD and XML from the previous post. For a quick review, here is what DTD and XML look like:

<!DOCTYPE library [
  <!ELEMENT library (book*)>
  <!ELEMENT book (title, author, genre, chapter*)>
  <!ATTLIST book isbn ID #REQUIRED>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT genre (#PCDATA)>
  <!ELEMENT chapter (chaptitle,text)>
  <!ATTLIST chapter position NMTOKEN #REQUIRED>
  <!ELEMENT chaptitle (#PCDATA)>
  <!ELEMENT text (#PCDATA)>
]>
<?xml version="1.0" encoding="utf-8"?>
<library>
  <book isbn="isbn1234">
    <title>A Book</title>
    <author>An Author</author>
    <genre>Horror</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text></text>
    </chapter>
  </book>
  <book isbn="isbn1235">
    <title>Another Book</title>
    <author>Another Author</author>
    <genre>Science Fiction</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text>Sit Dolor Amet...</text>
    </chapter>
  </book>
</library>

Basic XPath query

XPath is a syntax for querying XML documents. The simplest form is to define the path to the element you want to access. Using the XML document above, the following XPath query returns a collection of all existing book elements:

//library/book

That's it. Two forward slashes indicate library are the root elements of the document, and a single slash indicates book is its child elements. Very simple, isn't it? But what if you want to specify a specific book? Suppose you want to return any book written by "An Author". The XPath will be:

//library/book/author[text() = "An Author"]/..

You can use text() to perform a comparison on the value of a node in square brackets, and the trailing "/.." means we want the parent element (i.e. move one node upward). XPath query can be performed using one of two functions: query() and evaluate(). Both perform queries, but the difference is the type of result they return. query() will always return DOMNodeList, and evaluate() returns typed results as much as possible. For example, if your XPath query returns the number of books written by a particular author rather than the actual book itself, then query() will return an empty DOMNodeList. evaluate() will return the number directly, so you can use it immediately without having to extract data from the node.

XPath's code and speed advantages

Let's make a quick demonstration, returning the number of books written by a specific author. We will first look at a viable approach, but it does not use XPath. This is to show you how to do this without using XPath and why XPath is so powerful.

<!DOCTYPE library [
  <!ELEMENT library (book*)>
  <!ELEMENT book (title, author, genre, chapter*)>
  <!ATTLIST book isbn ID #REQUIRED>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT genre (#PCDATA)>
  <!ELEMENT chapter (chaptitle,text)>
  <!ATTLIST chapter position NMTOKEN #REQUIRED>
  <!ELEMENT chaptitle (#PCDATA)>
  <!ELEMENT text (#PCDATA)>
]>

The next method achieves the same result, but uses XPath to select books written only by a specific author:

<?xml version="1.0" encoding="utf-8"?>
<library>
  <book isbn="isbn1234">
    <title>A Book</title>
    <author>An Author</author>
    <genre>Horror</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text></text>
    </chapter>
  </book>
  <book isbn="isbn1235">
    <title>Another Book</title>
    <author>Another Author</author>
    <genre>Science Fiction</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text>Sit Dolor Amet...</text>
    </chapter>
  </book>
</library>

Please note that we eliminated the need for PHP to test author values ​​this time. However, we can go a step further and use the XPath function count() to calculate the number of occurrences of this path.

//library/book

We only need one line of XPath to retrieve the required information without the need to use PHP to perform laborious filtering. In fact, this is an easier and more concise way to write this feature! Note that evaluate() is used in the last example. This is because the function count() returns a typed result. Using query() will return DOMNodeList, but you will find that it is an empty list. This not only makes your code more concise, but also has the advantage of speed. I found that version 1 has an average speed of 30% faster than version 2, but version 3 is about 10% faster than version 2 (about 15% faster than version 1). While these measurements vary based on your server and query, using pure XPath often brings considerable speed benefits while also making your code easier to read and maintain.

XPath function

XPath can use quite a lot of functions, and there are many excellent resources detailing the available functions. If you find yourself iterating over DOMNodeLists or comparing nodeValues, you may find an XPath function that eliminates a lot of PHP code. You have seen the usage of the count() function. Let's use the id() function to return the title of the book with the given ISBN. The XPath expression you need to use is:

//library/book/author[text() = "An Author"]/..

Note that the values ​​to be searched here are enclosed in quotes and separated by spaces; no comma-separated terms are required.

<?php
public function getNumberOfBooksByAuthor($author) {
    $total = 0;
    $elements = $this->domDocument->getElementsByTagName("author");
    foreach ($elements as $element) {
        if ($element->nodeValue == $author) {
            $total++;
        }
    }
    return $total; // 修正:这里应该是 $total,而不是 $number
}
?>

Executing complex functions in XPath is relatively simple; the trick is to be familiar with the functions available.

Using PHP functions in XPath

Sometimes you may find yourself needing some more powerful features that standard XPath functions cannot provide. Fortunately, PHP DOM also allows you to integrate PHP's own functions into XPath queries. Let's consider returning the number of words in the title of the book. The simplest function, we can write the method like this:

<!DOCTYPE library [
  <!ELEMENT library (book*)>
  <!ELEMENT book (title, author, genre, chapter*)>
  <!ATTLIST book isbn ID #REQUIRED>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT genre (#PCDATA)>
  <!ELEMENT chapter (chaptitle,text)>
  <!ATTLIST chapter position NMTOKEN #REQUIRED>
  <!ELEMENT chaptitle (#PCDATA)>
  <!ELEMENT text (#PCDATA)>
]>

However, we can also integrate the function str_word_count() directly into the XPath query. Several steps need to be completed for this. First, we must register a namespace using the XPath object. The PHP function in the XPath query begins with "php:functionString", followed by the name of the function you want to use, enclosed in parentheses. Additionally, the namespace to be defined is http://php.net/xpath. The namespace must be set to this; any other value will cause an error. Then we need to call registerPHPFunctions(), which tells PHP that whenever we encounter a function with "php:" as the namespace, it should be handled by PHP. The actual syntax for calling a function is:

<?xml version="1.0" encoding="utf-8"?>
<library>
  <book isbn="isbn1234">
    <title>A Book</title>
    <author>An Author</author>
    <genre>Horror</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text></text>
    </chapter>
  </book>
  <book isbn="isbn1235">
    <title>Another Book</title>
    <author>Another Author</author>
    <genre>Science Fiction</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text>Sit Dolor Amet...</text>
    </chapter>
  </book>
</library>

Put all of this together and get the following reimplementation of getNumberOfWords():

//library/book

Note that you do not need to call the XPath function text() to provide the text of the node. The registerPHPFunctions() method will do this automatically. However, the following is also valid:

//library/book/author[text() = "An Author"]/..

Register PHP functions are not limited to functions that come with PHP. You can define your own functions and provide them in XPath. The only difference is that when defining a function you use "php:function" instead of "php:functionString". In addition, only the function itself or static methods can be provided. Calling instance methods is not supported. Let's demonstrate the basic functionality using a regular function that is beyond the scope of the class. The function we will use will return only the books of "George Orwell". For each node you want to include in the query, it must return true.

<?php
public function getNumberOfBooksByAuthor($author) {
    $total = 0;
    $elements = $this->domDocument->getElementsByTagName("author");
    foreach ($elements as $element) {
        if ($element->nodeValue == $author) {
            $total++;
        }
    }
    return $total; // 修正:这里应该是 $total,而不是 $number
}
?>

The argument passed to the function is an array of DOMElements. The function is responsible for iterating over the array and determining whether the node to be tested should be returned in DOMNodeList. In this example, the node to be tested is /book, which we use /author to determine. Now we can create the method getGeorgeOrwellBooks():

<?php
public function getNumberOfBooksByAuthor($author) {
    $query = "//library/book/author[text() = '$author']/..";
    $xpath = new DOMXPath($this->domDocument);
    $result = $xpath->query($query);
    return $result->length;
}
?>

If compare() is a static method, then you need to modify the XPath query to read:

<?php
public function getNumberOfBooksByAuthor($author) {
    $query = "count(//library/book/author[text() = '$author']/..)";
    $xpath = new DOMXPath($this->domDocument);
    return $xpath->evaluate($query);
}
?>

In fact, all of these features can be easily written in XPath only, but this example shows how to extend an XPath query to make it more complex. The object method cannot be called in XPath. If you find that you need to access certain object properties or methods to complete XPath query, the best solution is to use XPath to complete the part you can do, and then use any object methods or attributes to process the generated DOMNodeList as needed.

Summary

XPath is a great way to reduce the amount of code written and speed up code execution when processing XML data. Although not part of the official DOM specification, additional features provided by PHP DOM allow you to extend standard XPath functions with custom functions. This is a very powerful feature, and as you become more familiar with the XPath function, you may find yourself relying less and less on it.

(Picture from Fotolia)

FAQs (FAQ) about PHP DOM with XPath

What is XPath and how does it work in PHP DOM?

XPath (XML Path Language) is a query language used to select nodes from an XML document. In PHP DOM, XPath is used to traverse elements and properties in an XML document. It allows you to find and select specific parts of an XML document in a variety of ways, such as selecting a node by name, selecting a node by its attribute value, or selecting a node by its location in the document. This makes it a powerful tool for parsing and manipulating XML data in PHP.

How to create an instance of DOMXPath?

To create an instance of DOMXPath, you first need to create an instance of the DOMDocument class. Once you have obtained the DOMDocument object, you can create a new DOMXPath object by passing the DOMDocument object to the DOMXPath constructor. Here is an example:

<!DOCTYPE library [
  <!ELEMENT library (book*)>
  <!ELEMENT book (title, author, genre, chapter*)>
  <!ATTLIST book isbn ID #REQUIRED>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT genre (#PCDATA)>
  <!ELEMENT chapter (chaptitle,text)>
  <!ATTLIST chapter position NMTOKEN #REQUIRED>
  <!ELEMENT chaptitle (#PCDATA)>
  <!ELEMENT text (#PCDATA)>
]>

How to use XPath to select a node?

You can select nodes using the query() method of the DOMXPath object. The query() method takes the XPath expression as a parameter and returns a DOMNodeList object containing all nodes matching the expression. For example:

<?xml version="1.0" encoding="utf-8"?>
<library>
  <book isbn="isbn1234">
    <title>A Book</title>
    <author>An Author</author>
    <genre>Horror</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text></text>
    </chapter>
  </book>
  <book isbn="isbn1235">
    <title>Another Book</title>
    <author>Another Author</author>
    <genre>Science Fiction</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text>Sit Dolor Amet...</text>
    </chapter>
  </book>
</library>

This will select all <book></book> elements that are child elements of the <title></title> element.

What is the difference between

and query() methods in evaluate() DOMXPath?

Both the

query() and evaluate() methods are used to evaluate XPath expressions. The difference is the type of result they return. The query() method returns the DOMNodeList of all nodes that match the XPath expression. On the other hand, evaluate() returns a typed result, such as a boolean, number, or string, depending on the XPath expression. If the expression result is a node set, evaluate() will return a DOMNodeList.

How to handle namespaces in XPath query?

To handle namespaces in XPath query, you need to register the namespace with the DOMXPath object using the registerNamespace() method. This method has two parameters: the prefix and the namespace URI. After registering the namespace, you can use prefixes in your XPath query. For example:

//library/book

How to use XPath to select properties?

You can use the @ symbol followed by the property name to select properties in XPath. For example, to select all <a></a> properties of the href element, you can use the following XPath expression: //a/@href.

How to use XPath function in PHP DOM?

XPath provides many functions that can be used in XPath expressions. These functions can be used to manipulate strings, numbers, node sets, and more. To use the XPath function in PHP DOM, simply include the function in the XPath expression. For example, to select all <book></book> elements with a price element with a value greater than 30, you can use the number() function as shown below: //book[number(price) > 30].

Can I use XPath with HTML documents in PHP DOM?

Yes, you can use XPath with HTML documents in PHP DOM. However, since HTML is not always well-formed XML, you may have problems trying to use XPath with HTML. To avoid these problems, you can use the loadHTML() method of the DOMDocument class to load the HTML document. This method parses the HTML and corrects any formatting errors, allowing you to use XPath with the generated DOMDocument object.

How to handle errors when using XPath in PHP DOM?

When using XPath in PHP DOM, errors may occur for a number of reasons, such as an erroneous XPath expression format or an XML document cannot be loaded. To handle these errors, you can enable user error handling using the libxml_use_internal_errors() function. This function will cause libxml errors to be stored internally, allowing you to process them in your code. You can then use the libxml_get_errors() function to retrieve the errors and process them as needed.

Can I modify an XML document using XPath in PHP DOM?

While XPath itself does not provide a way to modify XML documents, you can use XPath with the DOM API to modify XML documents. You can use XPath to select the node you want to modify, and then use the methods provided by the DOM API to modify. For example, you can use the removeChild() method of the DOMNode class to delete a node, or use the setAttribute() method of the DOMElement class to change the value of the attribute.

The above is the detailed content of PHP DOM: Using XPath. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Working with Flash Session Data in LaravelWorking with Flash Session Data in LaravelMar 12, 2025 pm 05:08 PM

Laravel simplifies handling temporary session data using its intuitive flash methods. This is perfect for displaying brief messages, alerts, or notifications within your application. Data persists only for the subsequent request by default: $request-

cURL in PHP: How to Use the PHP cURL Extension in REST APIscURL in PHP: How to Use the PHP cURL Extension in REST APIsMar 14, 2025 am 11:42 AM

The PHP Client URL (cURL) extension is a powerful tool for developers, enabling seamless interaction with remote servers and REST APIs. By leveraging libcurl, a well-respected multi-protocol file transfer library, PHP cURL facilitates efficient execution of various network protocols, including HTTP, HTTPS, and FTP. This extension offers granular control over HTTP requests, supports multiple concurrent operations, and provides built-in security features.

Simplified HTTP Response Mocking in Laravel TestsSimplified HTTP Response Mocking in Laravel TestsMar 12, 2025 pm 05:09 PM

Laravel provides concise HTTP response simulation syntax, simplifying HTTP interaction testing. This approach significantly reduces code redundancy while making your test simulation more intuitive. The basic implementation provides a variety of response type shortcuts: use Illuminate\Support\Facades\Http; Http::fake([ 'google.com' => 'Hello World', 'github.com' => ['foo' => 'bar'], 'forge.laravel.com' =>

12 Best PHP Chat Scripts on CodeCanyon12 Best PHP Chat Scripts on CodeCanyonMar 13, 2025 pm 12:08 PM

Do you want to provide real-time, instant solutions to your customers' most pressing problems? Live chat lets you have real-time conversations with customers and resolve their problems instantly. It allows you to provide faster service to your custom

Explain the concept of late static binding in PHP.Explain the concept of late static binding in PHP.Mar 21, 2025 pm 01:33 PM

Article discusses late static binding (LSB) in PHP, introduced in PHP 5.3, allowing runtime resolution of static method calls for more flexible inheritance.Main issue: LSB vs. traditional polymorphism; LSB's practical applications and potential perfo

Discover File Downloads in Laravel with Storage::downloadDiscover File Downloads in Laravel with Storage::downloadMar 06, 2025 am 02:22 AM

The Storage::download method of the Laravel framework provides a concise API for safely handling file downloads while managing abstractions of file storage. Here is an example of using Storage::download() in the example controller:

PHP Logging: Best Practices for PHP Log AnalysisPHP Logging: Best Practices for PHP Log AnalysisMar 10, 2025 pm 02:32 PM

PHP logging is essential for monitoring and debugging web applications, as well as capturing critical events, errors, and runtime behavior. It provides valuable insights into system performance, helps identify issues, and supports faster troubleshoot

How to Register and Use Laravel Service ProvidersHow to Register and Use Laravel Service ProvidersMar 07, 2025 am 01:18 AM

Laravel's service container and service providers are fundamental to its architecture. This article explores service containers, details service provider creation, registration, and demonstrates practical usage with examples. We'll begin with an ove

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.