Home >Backend Development >PHP Tutorial >PHP DOM: Using XPath
Core points
query()
and evaluate()
. Although both perform queries, the difference is that the type of result they return, query()
returns DOMNodeList
, while evaluate()
returns typed results as much as possible. This article will explore XPath in depth, including its features and how it is implemented in PHP. You will find that XPath can greatly reduce the amount of code required to write queries and filter XML data, and generally improve performance. I will demonstrate the PHP DOM XPath functionality using the same DTD and XML from the previous post. For a quick review, here is what DTD and XML look like:
<code class="language-xml"><!DOCTYPE library [ <!ELEMENT library (book*)> <!ELEMENT book (title, author, genre, chapter*)> <!ATTLIST book isbn ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT genre (#PCDATA)> <!ELEMENT chapter (chaptitle,text)> <!ATTLIST chapter position NMTOKEN #REQUIRED> <!ELEMENT chaptitle (#PCDATA)> <!ELEMENT text (#PCDATA)> ]></code>
<code class="language-xml"><?xml version="1.0" encoding="utf-8"?> <library> <book isbn="isbn1234"> <title>A Book</title> <author>An Author</author> <genre>Horror</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text></text> </chapter> </book> <book isbn="isbn1235"> <title>Another Book</title> <author>Another Author</author> <genre>Science Fiction</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text>Sit Dolor Amet...</text> </chapter> </book> </library></code>
Basic XPath query
XPath is a syntax for querying XML documents. The simplest form is to define the path to the element you want to access. Using the XML document above, the following XPath query returns a collection of all existing book
elements:
<code class="language-xpath">//library/book</code>
That's it. Two forward slashes indicate library
are the root elements of the document, and a single slash indicates book
is its child elements. Very simple, isn't it? But what if you want to specify a specific book? Suppose you want to return any book written by "An Author". The XPath will be:
<code class="language-xpath">//library/book/author[text() = "An Author"]/..</code>
You can use text()
to perform a comparison on the value of a node in square brackets, and the trailing "/.." means we want the parent element (i.e. move one node upward). XPath query can be performed using one of two functions: query()
and evaluate()
. Both perform queries, but the difference is the type of result they return. query()
will always return DOMNodeList
, and evaluate()
returns typed results as much as possible. For example, if your XPath query returns the number of books written by a particular author rather than the actual book itself, then query()
will return an empty DOMNodeList
. evaluate()
will return the number directly, so you can use it immediately without having to extract data from the node.
XPath's code and speed advantages
Let's make a quick demonstration, returning the number of books written by a specific author. We will first look at a viable approach, but it does not use XPath. This is to show you how to do this without using XPath and why XPath is so powerful.
<code class="language-xml"><!DOCTYPE library [ <!ELEMENT library (book*)> <!ELEMENT book (title, author, genre, chapter*)> <!ATTLIST book isbn ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT genre (#PCDATA)> <!ELEMENT chapter (chaptitle,text)> <!ATTLIST chapter position NMTOKEN #REQUIRED> <!ELEMENT chaptitle (#PCDATA)> <!ELEMENT text (#PCDATA)> ]></code>
The next method achieves the same result, but uses XPath to select books written only by a specific author:
<code class="language-xml"><?xml version="1.0" encoding="utf-8"?> <library> <book isbn="isbn1234"> <title>A Book</title> <author>An Author</author> <genre>Horror</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text></text> </chapter> </book> <book isbn="isbn1235"> <title>Another Book</title> <author>Another Author</author> <genre>Science Fiction</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text>Sit Dolor Amet...</text> </chapter> </book> </library></code>
Please note that we eliminated the need for PHP to test author values this time. However, we can go a step further and use the XPath function count()
to calculate the number of occurrences of this path.
<code class="language-xpath">//library/book</code>
We only need one line of XPath to retrieve the required information without the need to use PHP to perform laborious filtering. In fact, this is an easier and more concise way to write this feature! Note that evaluate()
is used in the last example. This is because the function count()
returns a typed result. Using query()
will return DOMNodeList
, but you will find that it is an empty list. This not only makes your code more concise, but also has the advantage of speed. I found that version 1 has an average speed of 30% faster than version 2, but version 3 is about 10% faster than version 2 (about 15% faster than version 1). While these measurements vary based on your server and query, using pure XPath often brings considerable speed benefits while also making your code easier to read and maintain.
XPath function
XPath can use quite a lot of functions, and there are many excellent resources detailing the available functions. If you find yourself iterating over DOMNodeLists
or comparing nodeValues
, you may find an XPath function that eliminates a lot of PHP code. You have seen the usage of the count()
function. Let's use the id()
function to return the title of the book with the given ISBN. The XPath expression you need to use is:
<code class="language-xpath">//library/book/author[text() = "An Author"]/..</code>
Note that the values to be searched here are enclosed in quotes and separated by spaces; no comma-separated terms are required.
<code class="language-php"><?php public function getNumberOfBooksByAuthor($author) { $total = 0; $elements = $this->domDocument->getElementsByTagName("author"); foreach ($elements as $element) { if ($element->nodeValue == $author) { $total++; } } return $total; // 修正:这里应该是 $total,而不是 $number } ?></code>
Executing complex functions in XPath is relatively simple; the trick is to be familiar with the functions available.
Using PHP functions in XPath
Sometimes you may find yourself needing some more powerful features that standard XPath functions cannot provide. Fortunately, PHP DOM also allows you to integrate PHP's own functions into XPath queries. Let's consider returning the number of words in the title of the book. The simplest function, we can write the method like this:
<code class="language-xml"><!DOCTYPE library [ <!ELEMENT library (book*)> <!ELEMENT book (title, author, genre, chapter*)> <!ATTLIST book isbn ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT genre (#PCDATA)> <!ELEMENT chapter (chaptitle,text)> <!ATTLIST chapter position NMTOKEN #REQUIRED> <!ELEMENT chaptitle (#PCDATA)> <!ELEMENT text (#PCDATA)> ]></code>
However, we can also integrate the function str_word_count()
directly into the XPath query. Several steps need to be completed for this. First, we must register a namespace using the XPath object. The PHP function in the XPath query begins with "php:functionString
", followed by the name of the function you want to use, enclosed in parentheses. Additionally, the namespace to be defined is http://php.net/xpath
. The namespace must be set to this; any other value will cause an error. Then we need to call registerPHPFunctions()
, which tells PHP that whenever we encounter a function with "php:
" as the namespace, it should be handled by PHP. The actual syntax for calling a function is:
<code class="language-xml"><?xml version="1.0" encoding="utf-8"?> <library> <book isbn="isbn1234"> <title>A Book</title> <author>An Author</author> <genre>Horror</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text></text> </chapter> </book> <book isbn="isbn1235"> <title>Another Book</title> <author>Another Author</author> <genre>Science Fiction</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text>Sit Dolor Amet...</text> </chapter> </book> </library></code>
Put all of this together and get the following reimplementation of getNumberOfWords()
:
<code class="language-xpath">//library/book</code>
Note that you do not need to call the XPath function text()
to provide the text of the node. The registerPHPFunctions()
method will do this automatically. However, the following is also valid:
<code class="language-xpath">//library/book/author[text() = "An Author"]/..</code>
Register PHP functions are not limited to functions that come with PHP. You can define your own functions and provide them in XPath. The only difference is that when defining a function you use "php:function
" instead of "php:functionString
". In addition, only the function itself or static methods can be provided. Calling instance methods is not supported. Let's demonstrate the basic functionality using a regular function that is beyond the scope of the class. The function we will use will return only the books of "George Orwell". For each node you want to include in the query, it must return true
.
<code class="language-php"><?php public function getNumberOfBooksByAuthor($author) { $total = 0; $elements = $this->domDocument->getElementsByTagName("author"); foreach ($elements as $element) { if ($element->nodeValue == $author) { $total++; } } return $total; // 修正:这里应该是 $total,而不是 $number } ?></code>
The argument passed to the function is an array of DOMElements
. The function is responsible for iterating over the array and determining whether the node to be tested should be returned in DOMNodeList
. In this example, the node to be tested is /book
, which we use /author
to determine. Now we can create the method getGeorgeOrwellBooks()
:
<code class="language-php"><?php public function getNumberOfBooksByAuthor($author) { $query = "//library/book/author[text() = '$author']/.."; $xpath = new DOMXPath($this->domDocument); $result = $xpath->query($query); return $result->length; } ?></code>
If compare()
is a static method, then you need to modify the XPath query to read:
<code class="language-php"><?php public function getNumberOfBooksByAuthor($author) { $query = "count(//library/book/author[text() = '$author']/..)"; $xpath = new DOMXPath($this->domDocument); return $xpath->evaluate($query); } ?></code>
In fact, all of these features can be easily written in XPath only, but this example shows how to extend an XPath query to make it more complex. The object method cannot be called in XPath. If you find that you need to access certain object properties or methods to complete XPath query, the best solution is to use XPath to complete the part you can do, and then use any object methods or attributes to process the generated DOMNodeList
as needed.
Summary
XPath is a great way to reduce the amount of code written and speed up code execution when processing XML data. Although not part of the official DOM specification, additional features provided by PHP DOM allow you to extend standard XPath functions with custom functions. This is a very powerful feature, and as you become more familiar with the XPath function, you may find yourself relying less and less on it.
(Picture from Fotolia)
FAQs (FAQ) about PHP DOM with XPath
XPath (XML Path Language) is a query language used to select nodes from an XML document. In PHP DOM, XPath is used to traverse elements and properties in an XML document. It allows you to find and select specific parts of an XML document in a variety of ways, such as selecting a node by name, selecting a node by its attribute value, or selecting a node by its location in the document. This makes it a powerful tool for parsing and manipulating XML data in PHP.
To create an instance of DOMXPath, you first need to create an instance of the DOMDocument class. Once you have obtained the DOMDocument object, you can create a new DOMXPath object by passing the DOMDocument object to the DOMXPath constructor. Here is an example:
<code class="language-xml"><!DOCTYPE library [ <!ELEMENT library (book*)> <!ELEMENT book (title, author, genre, chapter*)> <!ATTLIST book isbn ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT genre (#PCDATA)> <!ELEMENT chapter (chaptitle,text)> <!ATTLIST chapter position NMTOKEN #REQUIRED> <!ELEMENT chaptitle (#PCDATA)> <!ELEMENT text (#PCDATA)> ]></code>
You can select nodes using the query()
method of the DOMXPath object. The query()
method takes the XPath expression as a parameter and returns a DOMNodeList object containing all nodes matching the expression. For example:
<code class="language-xml"><?xml version="1.0" encoding="utf-8"?> <library> <book isbn="isbn1234"> <title>A Book</title> <author>An Author</author> <genre>Horror</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text></text> </chapter> </book> <book isbn="isbn1235"> <title>Another Book</title> <author>Another Author</author> <genre>Science Fiction</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text>Sit Dolor Amet...</text> </chapter> </book> </library></code>
This will select all <book></book>
elements that are child elements of the <title></title>
element.
query()
methods in evaluate()
DOMXPath? query()
and evaluate()
methods are used to evaluate XPath expressions. The difference is the type of result they return. The query()
method returns the DOMNodeList of all nodes that match the XPath expression. On the other hand, evaluate()
returns a typed result, such as a boolean, number, or string, depending on the XPath expression. If the expression result is a node set, evaluate()
will return a DOMNodeList.
To handle namespaces in XPath query, you need to register the namespace with the DOMXPath object using the registerNamespace()
method. This method has two parameters: the prefix and the namespace URI. After registering the namespace, you can use prefixes in your XPath query. For example:
<code class="language-xpath">//library/book</code>
You can use the @
symbol followed by the property name to select properties in XPath. For example, to select all <a></a>
properties of the href
element, you can use the following XPath expression: //a/@href
.
XPath provides many functions that can be used in XPath expressions. These functions can be used to manipulate strings, numbers, node sets, and more. To use the XPath function in PHP DOM, simply include the function in the XPath expression. For example, to select all <book></book>
elements with a price element with a value greater than 30, you can use the number()
function as shown below: //book[number(price) > 30]
.
Yes, you can use XPath with HTML documents in PHP DOM. However, since HTML is not always well-formed XML, you may have problems trying to use XPath with HTML. To avoid these problems, you can use the loadHTML()
method of the DOMDocument class to load the HTML document. This method parses the HTML and corrects any formatting errors, allowing you to use XPath with the generated DOMDocument object.
When using XPath in PHP DOM, errors may occur for a number of reasons, such as an erroneous XPath expression format or an XML document cannot be loaded. To handle these errors, you can enable user error handling using the libxml_use_internal_errors()
function. This function will cause libxml errors to be stored internally, allowing you to process them in your code. You can then use the libxml_get_errors()
function to retrieve the errors and process them as needed.
While XPath itself does not provide a way to modify XML documents, you can use XPath with the DOM API to modify XML documents. You can use XPath to select the node you want to modify, and then use the methods provided by the DOM API to modify. For example, you can use the removeChild()
method of the DOMNode class to delete a node, or use the setAttribute()
method of the DOMElement class to change the value of the attribute.
The above is the detailed content of PHP DOM: Using XPath. For more information, please follow other related articles on the PHP Chinese website!