PHP DOM: Using XPath-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

PHP DOM: Using XPath

尊渡假赌尊渡假赌尊渡假赌

Feb 26, 2025 am 09:07 AM

PHP DOM: Using XPath

Core points

XPath is a syntax for querying XML documents that provides a simpler and cleaner way to write functionality and reduces the amount of code required to write queries and filter XML data.
XPath query can be performed using two functions: query() and evaluate(). Although both perform queries, the difference is that the type of result they return, query() returns DOMNodeList, while evaluate() returns typed results as much as possible.
Using XPath can make the code more concise and efficient. In the comparison test, the speed advantage of using pure XPath is quite obvious, with the XPath version about 10% faster than the non-XPath version.
PHP DOM allows you to extend standard XPath functions with custom functions. This includes integrating PHP's own functions into XPath queries and registering PHP functions used in XPath. This extends the functionality of XPath to enable it to perform more complex queries.

This article will explore XPath in depth, including its features and how it is implemented in PHP. You will find that XPath can greatly reduce the amount of code required to write queries and filter XML data, and generally improve performance. I will demonstrate the PHP DOM XPath functionality using the same DTD and XML from the previous post. For a quick review, here is what DTD and XML look like:

<!DOCTYPE library [
  <!ELEMENT library (book*)>
  <!ELEMENT book (title, author, genre, chapter*)>
  <!ATTLIST book isbn ID #REQUIRED>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT genre (#PCDATA)>
  <!ELEMENT chapter (chaptitle,text)>
  <!ATTLIST chapter position NMTOKEN #REQUIRED>
  <!ELEMENT chaptitle (#PCDATA)>
  <!ELEMENT text (#PCDATA)>
]>

<?xml version="1.0" encoding="utf-8"?>
<library>
  <book isbn="isbn1234">
    <title>A Book</title>
    <author>An Author</author>
    <genre>Horror</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text></text>
    </chapter>
  </book>
  <book isbn="isbn1235">
    <title>Another Book</title>
    <author>Another Author</author>
    <genre>Science Fiction</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text>Sit Dolor Amet...</text>
    </chapter>
  </book>
</library>

Basic XPath query

XPath is a syntax for querying XML documents. The simplest form is to define the path to the element you want to access. Using the XML document above, the following XPath query returns a collection of all existing book elements:

//library/book

That's it. Two forward slashes indicate library are the root elements of the document, and a single slash indicates book is its child elements. Very simple, isn't it? But what if you want to specify a specific book? Suppose you want to return any book written by "An Author". The XPath will be:

//library/book/author[text() = "An Author"]/..

You can use text() to perform a comparison on the value of a node in square brackets, and the trailing "/.." means we want the parent element (i.e. move one node upward). XPath query can be performed using one of two functions: query() and evaluate(). Both perform queries, but the difference is the type of result they return. query() will always return DOMNodeList, and evaluate() returns typed results as much as possible. For example, if your XPath query returns the number of books written by a particular author rather than the actual book itself, then query() will return an empty DOMNodeList. evaluate() will return the number directly, so you can use it immediately without having to extract data from the node.

XPath's code and speed advantages

Let's make a quick demonstration, returning the number of books written by a specific author. We will first look at a viable approach, but it does not use XPath. This is to show you how to do this without using XPath and why XPath is so powerful.

<!DOCTYPE library [
  <!ELEMENT library (book*)>
  <!ELEMENT book (title, author, genre, chapter*)>
  <!ATTLIST book isbn ID #REQUIRED>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT genre (#PCDATA)>
  <!ELEMENT chapter (chaptitle,text)>
  <!ATTLIST chapter position NMTOKEN #REQUIRED>
  <!ELEMENT chaptitle (#PCDATA)>
  <!ELEMENT text (#PCDATA)>
]>

The next method achieves the same result, but uses XPath to select books written only by a specific author:

<?xml version="1.0" encoding="utf-8"?>
<library>
  <book isbn="isbn1234">
    <title>A Book</title>
    <author>An Author</author>
    <genre>Horror</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text></text>
    </chapter>
  </book>
  <book isbn="isbn1235">
    <title>Another Book</title>
    <author>Another Author</author>
    <genre>Science Fiction</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text>Sit Dolor Amet...</text>
    </chapter>
  </book>
</library>

Please note that we eliminated the need for PHP to test author values this time. However, we can go a step further and use the XPath function count() to calculate the number of occurrences of this path.

//library/book

We only need one line of XPath to retrieve the required information without the need to use PHP to perform laborious filtering. In fact, this is an easier and more concise way to write this feature! Note that evaluate() is used in the last example. This is because the function count() returns a typed result. Using query() will return DOMNodeList, but you will find that it is an empty list. This not only makes your code more concise, but also has the advantage of speed. I found that version 1 has an average speed of 30% faster than version 2, but version 3 is about 10% faster than version 2 (about 15% faster than version 1). While these measurements vary based on your server and query, using pure XPath often brings considerable speed benefits while also making your code easier to read and maintain.

XPath function

XPath can use quite a lot of functions, and there are many excellent resources detailing the available functions. If you find yourself iterating over DOMNodeLists or comparing nodeValues, you may find an XPath function that eliminates a lot of PHP code. You have seen the usage of the count() function. Let's use the id() function to return the title of the book with the given ISBN. The XPath expression you need to use is:

//library/book/author[text() = "An Author"]/..

Note that the values to be searched here are enclosed in quotes and separated by spaces; no comma-separated terms are required.

<?php
public function getNumberOfBooksByAuthor($author) {
    $total = 0;
    $elements = $this->domDocument->getElementsByTagName("author");
    foreach ($elements as $element) {
        if ($element->nodeValue == $author) {
            $total++;
        }
    }
    return $total; // 修正：这里应该是 $total，而不是 $number
}
?>

Executing complex functions in XPath is relatively simple; the trick is to be familiar with the functions available.

Using PHP functions in XPath

Sometimes you may find yourself needing some more powerful features that standard XPath functions cannot provide. Fortunately, PHP DOM also allows you to integrate PHP's own functions into XPath queries. Let's consider returning the number of words in the title of the book. The simplest function, we can write the method like this:

<!DOCTYPE library [
  <!ELEMENT library (book*)>
  <!ELEMENT book (title, author, genre, chapter*)>
  <!ATTLIST book isbn ID #REQUIRED>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT genre (#PCDATA)>
  <!ELEMENT chapter (chaptitle,text)>
  <!ATTLIST chapter position NMTOKEN #REQUIRED>
  <!ELEMENT chaptitle (#PCDATA)>
  <!ELEMENT text (#PCDATA)>
]>

However, we can also integrate the function str_word_count() directly into the XPath query. Several steps need to be completed for this. First, we must register a namespace using the XPath object. The PHP function in the XPath query begins with "php:functionString", followed by the name of the function you want to use, enclosed in parentheses. Additionally, the namespace to be defined is http://php.net/xpath. The namespace must be set to this; any other value will cause an error. Then we need to call registerPHPFunctions(), which tells PHP that whenever we encounter a function with "php:" as the namespace, it should be handled by PHP. The actual syntax for calling a function is:

<?xml version="1.0" encoding="utf-8"?>
<library>
  <book isbn="isbn1234">
    <title>A Book</title>
    <author>An Author</author>
    <genre>Horror</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text></text>
    </chapter>
  </book>
  <book isbn="isbn1235">
    <title>Another Book</title>
    <author>Another Author</author>
    <genre>Science Fiction</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text>Sit Dolor Amet...</text>
    </chapter>
  </book>
</library>

Put all of this together and get the following reimplementation of getNumberOfWords():

//library/book

Note that you do not need to call the XPath function text() to provide the text of the node. The registerPHPFunctions() method will do this automatically. However, the following is also valid:

//library/book/author[text() = "An Author"]/..

Register PHP functions are not limited to functions that come with PHP. You can define your own functions and provide them in XPath. The only difference is that when defining a function you use "php:function" instead of "php:functionString". In addition, only the function itself or static methods can be provided. Calling instance methods is not supported. Let's demonstrate the basic functionality using a regular function that is beyond the scope of the class. The function we will use will return only the books of "George Orwell". For each node you want to include in the query, it must return true.

<?php
public function getNumberOfBooksByAuthor($author) {
    $total = 0;
    $elements = $this->domDocument->getElementsByTagName("author");
    foreach ($elements as $element) {
        if ($element->nodeValue == $author) {
            $total++;
        }
    }
    return $total; // 修正：这里应该是 $total，而不是 $number
}
?>

The argument passed to the function is an array of DOMElements. The function is responsible for iterating over the array and determining whether the node to be tested should be returned in DOMNodeList. In this example, the node to be tested is /book, which we use /author to determine. Now we can create the method getGeorgeOrwellBooks():

<?php
public function getNumberOfBooksByAuthor($author) {
    $query = "//library/book/author[text() = '$author']/..";
    $xpath = new DOMXPath($this->domDocument);
    $result = $xpath->query($query);
    return $result->length;
}
?>

If compare() is a static method, then you need to modify the XPath query to read:

<?php
public function getNumberOfBooksByAuthor($author) {
    $query = "count(//library/book/author[text() = '$author']/..)";
    $xpath = new DOMXPath($this->domDocument);
    return $xpath->evaluate($query);
}
?>

In fact, all of these features can be easily written in XPath only, but this example shows how to extend an XPath query to make it more complex. The object method cannot be called in XPath. If you find that you need to access certain object properties or methods to complete XPath query, the best solution is to use XPath to complete the part you can do, and then use any object methods or attributes to process the generated DOMNodeList as needed.

Summary

XPath is a great way to reduce the amount of code written and speed up code execution when processing XML data. Although not part of the official DOM specification, additional features provided by PHP DOM allow you to extend standard XPath functions with custom functions. This is a very powerful feature, and as you become more familiar with the XPath function, you may find yourself relying less and less on it.

(Picture from Fotolia)

FAQs (FAQ) about PHP DOM with XPath

What is XPath and how does it work in PHP DOM?

XPath (XML Path Language) is a query language used to select nodes from an XML document. In PHP DOM, XPath is used to traverse elements and properties in an XML document. It allows you to find and select specific parts of an XML document in a variety of ways, such as selecting a node by name, selecting a node by its attribute value, or selecting a node by its location in the document. This makes it a powerful tool for parsing and manipulating XML data in PHP.

How to create an instance of DOMXPath?

To create an instance of DOMXPath, you first need to create an instance of the DOMDocument class. Once you have obtained the DOMDocument object, you can create a new DOMXPath object by passing the DOMDocument object to the DOMXPath constructor. Here is an example:

<!DOCTYPE library [
  <!ELEMENT library (book*)>
  <!ELEMENT book (title, author, genre, chapter*)>
  <!ATTLIST book isbn ID #REQUIRED>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT genre (#PCDATA)>
  <!ELEMENT chapter (chaptitle,text)>
  <!ATTLIST chapter position NMTOKEN #REQUIRED>
  <!ELEMENT chaptitle (#PCDATA)>
  <!ELEMENT text (#PCDATA)>
]>

How to use XPath to select a node?

You can select nodes using the query() method of the DOMXPath object. The query() method takes the XPath expression as a parameter and returns a DOMNodeList object containing all nodes matching the expression. For example:

<?xml version="1.0" encoding="utf-8"?>
<library>
  <book isbn="isbn1234">
    <title>A Book</title>
    <author>An Author</author>
    <genre>Horror</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text></text>
    </chapter>
  </book>
  <book isbn="isbn1235">
    <title>Another Book</title>
    <author>Another Author</author>
    <genre>Science Fiction</genre>
    <chapter position="first">
      <chaptitle>chapter one</chaptitle>
      <text>Sit Dolor Amet...</text>
    </chapter>
  </book>
</library>

This will select all <book></book> elements that are child elements of the <title></title> element.

What is the difference between

and `query()` methods in `evaluate()` DOMXPath?

Both the

query() and evaluate() methods are used to evaluate XPath expressions. The difference is the type of result they return. The query() method returns the DOMNodeList of all nodes that match the XPath expression. On the other hand, evaluate() returns a typed result, such as a boolean, number, or string, depending on the XPath expression. If the expression result is a node set, evaluate() will return a DOMNodeList.

How to handle namespaces in XPath query?

To handle namespaces in XPath query, you need to register the namespace with the DOMXPath object using the registerNamespace() method. This method has two parameters: the prefix and the namespace URI. After registering the namespace, you can use prefixes in your XPath query. For example:

//library/book

How to use XPath to select properties?

You can use the @ symbol followed by the property name to select properties in XPath. For example, to select all <a></a> properties of the href element, you can use the following XPath expression: //a/@href.

How to use XPath function in PHP DOM?

XPath provides many functions that can be used in XPath expressions. These functions can be used to manipulate strings, numbers, node sets, and more. To use the XPath function in PHP DOM, simply include the function in the XPath expression. For example, to select all <book></book> elements with a price element with a value greater than 30, you can use the number() function as shown below: //book[number(price) > 30].

Can I use XPath with HTML documents in PHP DOM?

Yes, you can use XPath with HTML documents in PHP DOM. However, since HTML is not always well-formed XML, you may have problems trying to use XPath with HTML. To avoid these problems, you can use the loadHTML() method of the DOMDocument class to load the HTML document. This method parses the HTML and corrects any formatting errors, allowing you to use XPath with the generated DOMDocument object.

How to handle errors when using XPath in PHP DOM?

When using XPath in PHP DOM, errors may occur for a number of reasons, such as an erroneous XPath expression format or an XML document cannot be loaded. To handle these errors, you can enable user error handling using the libxml_use_internal_errors() function. This function will cause libxml errors to be stored internally, allowing you to process them in your code. You can then use the libxml_get_errors() function to retrieve the errors and process them as needed.

Can I modify an XML document using XPath in PHP DOM?

While XPath itself does not provide a way to modify XML documents, you can use XPath with the DOM API to modify XML documents. You can use XPath to select the node you want to modify, and then use the methods provided by the DOM API to modify. For example, you can use the removeChild() method of the DOMNode class to delete a node, or use the setAttribute() method of the DOMElement class to change the value of the attribute.

The above is the detailed content of PHP DOM: Using XPath. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

PHP Dependency Injection Container: A Quick StartMay 13, 2025 am 12:11 AM

APHPDependencyInjectionContainerisatoolthatmanagesclassdependencies,enhancingcodemodularity,testability,andmaintainability.Itactsasacentralhubforcreatingandinjectingdependencies,thusreducingtightcouplingandeasingunittesting.

Dependency Injection vs. Service Locator in PHPMay 13, 2025 am 12:10 AM

Select DependencyInjection (DI) for large applications, ServiceLocator is suitable for small projects or prototypes. 1) DI improves the testability and modularity of the code through constructor injection. 2) ServiceLocator obtains services through center registration, which is convenient but may lead to an increase in code coupling.

PHP performance optimization strategies.May 13, 2025 am 12:06 AM

PHPapplicationscanbeoptimizedforspeedandefficiencyby:1)enablingopcacheinphp.ini,2)usingpreparedstatementswithPDOfordatabasequeries,3)replacingloopswitharray_filterandarray_mapfordataprocessing,4)configuringNginxasareverseproxy,5)implementingcachingwi

PHP Email Validation: Ensuring Emails Are Sent CorrectlyMay 13, 2025 am 12:06 AM

PHPemailvalidationinvolvesthreesteps:1)Formatvalidationusingregularexpressionstochecktheemailformat;2)DNSvalidationtoensurethedomainhasavalidMXrecord;3)SMTPvalidation,themostthoroughmethod,whichchecksifthemailboxexistsbyconnectingtotheSMTPserver.Impl

How to make PHP applications fasterMay 12, 2025 am 12:12 AM

TomakePHPapplicationsfaster,followthesesteps:1)UseOpcodeCachinglikeOPcachetostoreprecompiledscriptbytecode.2)MinimizeDatabaseQueriesbyusingquerycachingandefficientindexing.3)LeveragePHP7 Featuresforbettercodeefficiency.4)ImplementCachingStrategiessuc

PHP Performance Optimization Checklist: Improve Speed NowMay 12, 2025 am 12:07 AM

ToimprovePHPapplicationspeed,followthesesteps:1)EnableopcodecachingwithAPCutoreducescriptexecutiontime.2)ImplementdatabasequerycachingusingPDOtominimizedatabasehits.3)UseHTTP/2tomultiplexrequestsandreduceconnectionoverhead.4)Limitsessionusagebyclosin

PHP Dependency Injection: Improve Code TestabilityMay 12, 2025 am 12:03 AM

Dependency injection (DI) significantly improves the testability of PHP code by explicitly transitive dependencies. 1) DI decoupling classes and specific implementations make testing and maintenance more flexible. 2) Among the three types, the constructor injects explicit expression dependencies to keep the state consistent. 3) Use DI containers to manage complex dependencies to improve code quality and development efficiency.

PHP Performance Optimization: Database Query OptimizationMay 12, 2025 am 12:02 AM

DatabasequeryoptimizationinPHPinvolvesseveralstrategiestoenhanceperformance.1)Selectonlynecessarycolumnstoreducedatatransfer.2)Useindexingtospeedupdataretrieval.3)Implementquerycachingtostoreresultsoffrequentqueries.4)Utilizepreparedstatementsforeffi

See all articles