XML—XPATH syntax introduction-XML/RSS Tutorial-php.cn

Home

Backend Development

XML/RSS Tutorial

XML—XPATH syntax introduction

黄舟

Feb 24, 2017 pm 03:19 PM

Why do we need xpath?

When using dom4j, we cannot obtain a certain element across layers. We must obtain it layer by layer, which is very troublesome.
So in order for us to access a certain node more conveniently, we can use xpath technology, which allows us to read the specified node very conveniently.

xpath is usually used in conjunction with dom4j, And if you want to use xpath, you need to introduce a new package jaxen-1.1-beta-6.jar

The basic syntax of xpath has the following points:

1. The basic xpath syntax is similar to locating files in a file system. If the path starts with a slash / starts, then the path represents the absolute path to an element.

(1) /AAA, which represents the selection of the root element AAA

<AAA>这里    <BBB/>
    <CCC/>
    <BBB/>
    <BBB/>
    <DDD>
        <BBB/>
    <DDD/>
    <CCC/><AAA/>这里

(2) /AAA/CCC, indicating the selection of all CCC sub-elements of AAA

<AAA>
    <BBB/>
    <CCC/>这里    <BBB/>
    <BBB/>
    <DDD>
        <BBB/>
    <DDD/>
    <CCC/>这里<AAA/>

(3) /AAA/DDD/BBB, indicating the selection All BBB sub-elements of AAA's sub-elements DDD

<AAA>
    <BBB/>
    <CCC/>
    <BBB/>
    <BBB/>
    <DDD>
        <BBB/>这里    <DDD/>
    <CCC/><AAA/>

So how to use xpath in dom4j? It's actually very simple:

//1.得到SAXReader解析器SAXReader saxReader = new SAXReader();
//2.指定去解析哪个文件Document document = saxReader.read(new File(path));
//3.可以使用xpath随心读取// document.selectNodes(args)返回多个元素
// document.selectSingleNode(args)返回单个元素List nodes = document.selectNodes("/AAA/BBB");

After getting the document object through dom4j, you can use the document's selectNodes(args) method. This method will return a List# based on the xpath path you wrote. ##, the remaining operations are similar to dom4j.

At the same time, it also has a

selectSingleNode(args) method, which is used to return a single Node.

The following continues to introduce other xpath syntax:

2. If the path starts with a double slash //, it means that all the files in the document satisfy Elements of the rules after the double slash // (regardless of hierarchical relationship)

(1)

//BBB, which means selecting all BBB elements

<AAA>
    <BBB/>这里    <CCC/>
    <BBB/>这里    <DDD>
        <BBB/>这里    </DDD>
    <CCC>
        <DDD>
            <BBB/>这里            <BBB/>这里        </DDD>
    </CCC></AAA>

(2)

//DDD/BBB, indicating that all parent elements are BBB elements of DDD

<AAA>
    <BBB/>
    <CCC/>
    <BBB/>
    <DDD>
        <BBB/>这里    </DDD>
    <CCC>
        <DDD>
            <BBB/>这里            <BBB/>这里        </DDD>
    </CCC></AAA>

3. Asterisk* means selecting all elements located by the path before the asterisk

(1)

/AAA/CCC/DDD/*, which means selecting all paths attached to / Elements of AAA/CCC/DDD:

<AAA>
    <XXX>
        <DDD>
            <BBB/>
            <BBB/>
            <EEE/>
            <FFF/>
        </DDD>
    </XXX>
    <CCC>
        <DDD>
            <BBB/>这里            
            <BBB/>这里            
            <EEE/>这里            
            <FFF/>这里        
            </DDD>
    </CCC>
    <CCC>
        <BBB>
            <BBB>
                <BBB/>
            </BBB>
        </BBB>
    </CCC></AAA>

(2)

/*/*/*/BBB, which represents all BBB elements with 3 ancestor elements

<AAA>
    <XXX>
        <DDD>
            <BBB/>这里            
            <BBB/>这里            
            <EEE/>
            <FFF/>
        </DDD>
    </XXX>
    <CCC>
        <DDD>
            <BBB/>这里            
            <BBB/>这里            
            <EEE/>
            <FFF/>
        </DDD>
    </CCC>
    <CCC>
        <BBB>这里            <BBB>
                <BBB/>
            </BBB>
        </BBB>
    </CCC></AAA>

(3)

//*, which means selecting all elements

4. The expressions in square brackets can further specify the elements, where the numbers represent The position of the element in the selection set, and the last() function represents the last element in the selection set. It is important to note that the subscripts here start from 1, not 0! (1)
/AAA/BBB[1], which means selecting the first BBB sub-element of AAA

<AAA>
    <BBB/>这个    <BBB/>
    <BBB/>
    <BBB/></AAA>

(2)

/AAA/ BBB[last()] means selecting the last BBB element of AAA

<AAA>
    <BBB/>
    <BBB/>
    <BBB/>
    <BBB/>这个</AAA>

5. Operations on attributes

(1)

//@id, select all id attributes. Note: all id attributes are returned as nodes, not nodes with id attributes.

<AAA>
    <BBB id="b1"/>返回这里的id属性节点    <BBB id="b2"/>也返回这里的id属性节点    <BBB name="bbb"/>
    <BBB/></AAA>

(2)

//BBB[@id], select all BBB nodes with id attributes

<AAA>
    <BBB id="b1"/>返回这个BBB节点    <BBB id="b2"/>也返回这个BBB节点    <BBB name="bbb"/>
    <BBB/></AAA>

(3)

//BBB[@ name], select all BBB nodes with name attribute

<AAA>
    <BBB id="b1"/>
    <BBB id="b2"/>
    <BBB name="bbb"/>返回这个BBB节点    <BBB/></AAA>

(4)

//BBB[@*], select all BBB nodes with attribute

<AAA>
    <BBB id="b1"/>返回这个BBB节点    <BBB id="b2"/>返回这个BBB节点    <BBB name="bbb"/>返回这个BBB节点    <BBB/></AAA>

(5)

//BBB[not(@*)], select all BBB nodes without attributes

<AAA>
    <BBB id="b1"/>
    <BBB id="b2"/>
    <BBB name="bbb"/>
    <BBB/>这个</AAA>

6. The value of the attribute can be used As the selection criteria

(1)

//BBB[@id='b1'], select the BBB element that contains the attribute id and its value is 'b1'

<AAA>
    <BBB id="b1"/>这个    <BBB name="bbb"/>
    <BBB name="bbb"/></AAA>

7.count()The function can count the number of selected elements

(1)

//* [count(BBB)=2], select the element containing 2 BBB sub-elements

<AAA>
    <CCC>
        <BBB/>
        <BBB/>
        <BBB/>
    </CCC>
    <DDD>返回这个元素        <BBB/>
        <BBB/>
    </DDD>
    <EEE>
        <CCC/>
        <DDD/>
    </EEE></AAA>

(2)

//*[count(*)=2], select Elements containing 2 sub-elements

<AAA>
    <CCC>
        <BBB/>
        <BBB/>
        <BBB/>
    </CCC>
    <DDD>返回这个元素        <BBB/>
        <BBB/>
    </DDD>
    <EEE>也返回这个元素        <CCC/>
        <DDD/>
    </EEE></AAA>

There are many other syntaxes, including the application of many functions, which are not used much and will not be introduced here

In addition, the syntax points introduced above can be combined in any combination, such as the following xml document:

<AAA>
    <BBB id="b1">
        <CCC>
            <KKK>k1</KKK>
        </CCC>
        <CCC>
            <KKK>k2</KKK>这个        </CCC>
    </BBB>
    <BBB id="b2"/>
    <BBB name="bbb"/></AAA>

If we now want to find the KKK sub-element of the second CCC sub-element under the first BBB sub-element under the AAA element element, the xpath path should be written like this:

/AAA/BBB[1]/CCC[2]/KKK

Why is xpath needed?

In use When using dom4j, we cannot obtain an element across layers. We must obtain it layer by layer, which is very troublesome.

So in order for us to access a certain node more conveniently, we can use xpath technology, which allows us to read the specified node very conveniently.

xpath is usually used in conjunction with dom4j, And if you want to use xpath, you need to introduce a new package jaxen-1.1-beta-6.jar

The basic syntax of xpath has the following points:

1. The basic xpath syntax is similar to locating files in a file system. If the path starts with a slash / starts, then the path represents the absolute path to an element.

(1)

/AAA, which represents the selection of the root element AAA

<AAA>这里    <BBB/>
    <CCC/>
    <BBB/>
    <BBB/>
    <DDD>
        <BBB/>
    <DDD/>
    <CCC/><AAA/>这里

(2)

/AAA/CCC, indicating the selection of all CCC sub-elements of AAA

<AAA>
    <BBB/>
    <CCC/>这里    <BBB/>
    <BBB/>
    <DDD>
        <BBB/>
    <DDD/>
    <CCC/>这里<AAA/>

(3)

/AAA/DDD/BBB, indicating the selection All BBB sub-elements of AAA's sub-elements DDD

<AAA>
    <BBB/>
    <CCC/>
    <BBB/>
    <BBB/>
    <DDD>
        <BBB/>这里    <DDD/>
    <CCC/><AAA/>

那么怎么在dom4j中运用xpath呢？其实很简单：

//1.得到SAXReader解析器SAXReader saxReader = new SAXReader();
//2.指定去解析哪个文件Document document = saxReader.read(new File(path));
//3.可以使用xpath随心读取
// document.selectNodes(args)返回多个元素
// document.selectSingleNode(args)返回单个元素List nodes = document.selectNodes("/AAA/BBB");

通过dom4j得到document对象后，可以使用document的selectNodes(args)方法，这个方法会根据你写的xpath路径返回一个List，余下的操作就和dom4j类似了。

同时它也有一个selectSingleNode(args)方法，用于返回一个单个的Node。

下面继续介绍其他的xpath语法：

2.如果路径以双斜线//开头，则表示文档中所有满足双斜线//之后规则的元素（无论层级关系）

（1）//BBB，它表示选择所有BBB元素

<AAA>
    <BBB/>这里    <CCC/>
    <BBB/>这里    <DDD>
        <BBB/>这里    </DDD>
    <CCC>
        <DDD>
            <BBB/>这里            <BBB/>这里        </DDD>
    </CCC></AAA>

（2）//DDD/BBB，表示所有父元素是DDD的BBB元素

<AAA>
    <BBB/>
    <CCC/>
    <BBB/>
    <DDD>
        <BBB/>这里    </DDD>
    <CCC>
        <DDD>
            <BBB/>这里            <BBB/>这里        </DDD>
    </CCC></AAA>

3.星号*表示选择所有由星号之前路径所定位的元素

（1）/AAA/CCC/DDD/*，它表示选择所有路径依附于/AAA/CCC/DDD的元素：

<AAA>
    <XXX>
        <DDD>
            <BBB/>
            <BBB/>
            <EEE/>
            <FFF/>
        </DDD>
    </XXX>
    <CCC>
        <DDD>
            <BBB/>这里            
            <BBB/>这里            
            <EEE/>这里            
            <FFF/>这里        
            </DDD>
    </CCC>
    <CCC>
        <BBB>
            <BBB>
                <BBB/>
            </BBB>
        </BBB>
    </CCC></AAA>

（2）/*/*/*/BBB，它表示所有的有3个祖先元素的BBB元素

<AAA>
    <XXX>
        <DDD>
            <BBB/>这里            
            <BBB/>这里            
            <EEE/>
            <FFF/>
        </DDD>
    </XXX>
    <CCC>
        <DDD>
            <BBB/>这里            
            <BBB/>这里            
            <EEE/>
            <FFF/>
        </DDD>
    </CCC>
    <CCC>
        <BBB>这里            
        <BBB>
                <BBB/>
            </BBB>
        </BBB>
    </CCC></AAA>

（3）//*，它表示选择所有的元素

4.方括号里的表达式可以进一步地指定元素，其中数字表示元素在选择集里的位置，而last()函数则表示选择集中的最后一个元素。特别要注意的是这里的下标是从1开始的，而不是0！
(1)/AAA/BBB[1]，它表示选择AAA的第一个BBB子元素

<AAA>
    <BBB/>这个    <BBB/>
    <BBB/>
    <BBB/></AAA>

（2）/AAA/BBB[last()]，表示选择AAA的最后一个BBB元素

<AAA>
    <BBB/>
    <BBB/>
    <BBB/>
    <BBB/>这个</AAA>

5.对属性的操作

（1）//@id，选择所有的id属性，注意：是把所有的id属性当做节点返回，而不是返回有id属性的节点。

<AAA>
    <BBB id="b1"/>返回这里的id属性节点    <BBB id="b2"/>也返回这里的id属性节点    <BBB name="bbb"/>
    <BBB/></AAA>

(2)//BBB[@id]，选择所有有id属性的BBB节点

<AAA>
    <BBB id="b1"/>返回这个BBB节点    <BBB id="b2"/>也返回这个BBB节点    <BBB name="bbb"/>
    <BBB/></AAA>

(3)//BBB[@name]，选择所有有name属性的BBB节点

<AAA>
    <BBB id="b1"/>
    <BBB id="b2"/>
    <BBB name="bbb"/>返回这个BBB节点    <BBB/></AAA>

(4)//BBB[@*]，选择所有有属性的BBB节点

<AAA>
    <BBB id="b1"/>返回这个BBB节点    <BBB id="b2"/>返回这个BBB节点    <BBB name="bbb"/>返回这个BBB节点    <BBB/></AAA>

(5)//BBB[not(@*)]，选择所有没有属性的BBB节点

<AAA>
    <BBB id="b1"/>
    <BBB id="b2"/>
    <BBB name="bbb"/>
    <BBB/>这个</AAA>

6.属性的值可以被用来作为选择的准则

（1）//BBB[@id='b1']，选择含有属性id且其值为’b1’的BBB元素

<AAA>
    <BBB id="b1"/>这个    <BBB name="bbb"/>
    <BBB name="bbb"/></AAA>

7.count()函数可以计数所选元素的个数

（1）//*[count(BBB)=2]，选择含有2个BBB子元素的元素

<AAA>
    <CCC>
        <BBB/>
        <BBB/>
        <BBB/>
    </CCC>
    <DDD>返回这个元素        <BBB/>
        <BBB/>
    </DDD>
    <EEE>
        <CCC/>
        <DDD/>
    </EEE></AAA>

（2）//*[count(*)=2]，选择含有2个子元素的元素

<AAA>
    <CCC>
        <BBB/>
        <BBB/>
        <BBB/>
    </CCC>
    <DDD>返回这个元素        <BBB/>
        <BBB/>
    </DDD>
    <EEE>也返回这个元素        <CCC/>
        <DDD/>
    </EEE></AAA>

还有很多其他的语法，包括很多函数的应用，用的不多，这里不做介绍

另外，上述介绍的几点语法可以任意组合，比如下述的xml文档：

<AAA>
    <BBB id="b1">
        <CCC>
            <KKK>k1</KKK>
        </CCC>
        <CCC>
            <KKK>k2</KKK>这个        </CCC>
    </BBB>
    <BBB id="b2"/>
    <BBB name="bbb"/></AAA>

假如我们现在要找AAA元素下面的第1个BBB子元素下面的第2CCC子元素的KKK子元素，则xpath路径应该这么写：
/AAA/BBB[1]/CCC[2]/KKK

以上就是XML——XPATH语法介绍的内容，更多相关内容请关注PHP中文网（www.php.cn）！

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Decoding RSS: The XML Structure of Content FeedsApr 17, 2025 am 12:09 AM

The XML structure of RSS includes: 1. XML declaration and RSS version, 2. Channel (Channel), 3. Item. These parts form the basis of RSS files, allowing users to obtain and process content information by parsing XML data.

How to Parse and Utilize XML-Based RSS FeedsApr 16, 2025 am 12:05 AM

RSSfeedsuseXMLtosyndicatecontent;parsingtheminvolvesloadingXML,navigatingitsstructure,andextractingdata.Applicationsincludebuildingnewsaggregatorsandtrackingpodcastepisodes.

RSS Documents: How They Deliver Your Favorite ContentApr 15, 2025 am 12:01 AM

RSS documents work by publishing content updates through XML files, and users subscribe and receive notifications through RSS readers. 1. Content publisher creates and updates RSS documents. 2. The RSS reader regularly accesses and parses XML files. 3. Users browse and read updated content. Example of usage: Subscribe to TechCrunch's RSS feed, just copy the link to the RSS reader.

Building Feeds with XML: A Hands-On Guide to RSSApr 14, 2025 am 12:17 AM

The steps to build an RSSfeed using XML are as follows: 1. Create the root element and set the version; 2. Add the channel element and its basic information; 3. Add the entry element, including the title, link and description; 4. Convert the XML structure to a string and output it. With these steps, you can create a valid RSSfeed from scratch and enhance its functionality by adding additional elements such as release date and author information.

Creating RSS Documents: A Step-by-Step TutorialApr 13, 2025 am 12:10 AM

The steps to create an RSS document are as follows: 1. Write in XML format, with the root element, including the elements. 2. Add, etc. elements to describe channel information. 3. Add elements, each representing a content entry, including,,,,,,,,,,,. 4. Optionally add and elements to enrich the content. 5. Ensure the XML format is correct, use online tools to verify, optimize performance and keep content updated.

XML's Role in RSS: The Foundation of Syndicated ContentApr 12, 2025 am 12:17 AM

The core role of XML in RSS is to provide a standardized and flexible data format. 1. The structure and markup language characteristics of XML make it suitable for data exchange and storage. 2. RSS uses XML to create a standardized format to facilitate content sharing. 3. The application of XML in RSS includes elements that define feed content, such as title and release date. 4. Advantages include standardization and scalability, and challenges include document verbose and strict syntax requirements. 5. Best practices include validating XML validity, keeping it simple, using CDATA, and regularly updating.

From XML to Readable Content: Demystifying RSS FeedsApr 11, 2025 am 12:03 AM

RSSfeedsareXMLdocumentsusedforcontentaggregationanddistribution.Totransformthemintoreadablecontent:1)ParsetheXMLusinglibrarieslikefeedparserinPython.2)HandledifferentRSSversionsandpotentialparsingerrors.3)Transformthedataintouser-friendlyformatsliket

Is There an RSS Alternative Based on JSON?Apr 10, 2025 am 09:31 AM

JSONFeed is a JSON-based RSS alternative that has its advantages simplicity and ease of use. 1) JSONFeed uses JSON format, which is easy to generate and parse. 2) It supports dynamic generation and is suitable for modern web development. 3) Using JSONFeed can improve content management efficiency and user experience.

See all articles