Home > Article > Backend Development > XML—XPATH syntax introduction
Why do we need xpath?
When using dom4j, we cannot obtain a certain element across layers. We must obtain it layer by layer, which is very troublesome.
So in order for us to access a certain node more conveniently, we can use xpath technology, which allows us to read the specified node very conveniently.
xpath is usually used in conjunction with dom4j, And if you want to use xpath, you need to introduce a new package jaxen-1.1-beta-6.jar
The basic syntax of xpath has the following points:
1. The basic xpath syntax is similar to locating files in a file system. If the path starts with a slash /
starts, then the path represents the absolute path to an element.
(1) /AAA
, which represents the selection of the root element AAA
<AAA>这里 <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> <DDD/> <CCC/><AAA/>这里
(2) /AAA/CCC
, indicating the selection of all CCC sub-elements of AAA
<AAA> <BBB/> <CCC/>这里 <BBB/> <BBB/> <DDD> <BBB/> <DDD/> <CCC/>这里<AAA/>
(3) /AAA/DDD/BBB
, indicating the selection All BBB sub-elements of AAA's sub-elements DDD
<AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/>这里 <DDD/> <CCC/><AAA/>
So how to use xpath in dom4j? It's actually very simple:
//1.得到SAXReader解析器SAXReader saxReader = new SAXReader(); //2.指定去解析哪个文件Document document = saxReader.read(new File(path)); //3.可以使用xpath随心读取// document.selectNodes(args)返回多个元素 // document.selectSingleNode(args)返回单个元素List nodes = document.selectNodes("/AAA/BBB");
After getting the document object through dom4j, you can use the document's selectNodes(args)
method. This method will return a List# based on the xpath path you wrote. ##, the remaining operations are similar to dom4j.
selectSingleNode(args) method, which is used to return a single Node.
2. If the path starts with a double slash //, it means that all the files in the document satisfy Elements of the rules after the double slash
// (regardless of hierarchical relationship)
//BBB, which means selecting all BBB elements
<AAA> <BBB/>这里 <CCC/> <BBB/>这里 <DDD> <BBB/>这里 </DDD> <CCC> <DDD> <BBB/>这里 <BBB/>这里 </DDD> </CCC></AAA>(2)
//DDD/BBB, indicating that all parent elements are BBB elements of DDD
<AAA> <BBB/> <CCC/> <BBB/> <DDD> <BBB/>这里 </DDD> <CCC> <DDD> <BBB/>这里 <BBB/>这里 </DDD> </CCC></AAA>
3. Asterisk* means selecting all elements located by the path before the asterisk
/AAA/CCC/DDD/*, which means selecting all paths attached to / Elements of AAA/CCC/DDD:
<AAA> <XXX> <DDD> <BBB/> <BBB/> <EEE/> <FFF/> </DDD> </XXX> <CCC> <DDD> <BBB/>这里 <BBB/>这里 <EEE/>这里 <FFF/>这里 </DDD> </CCC> <CCC> <BBB> <BBB> <BBB/> </BBB> </BBB> </CCC></AAA>(2)
/*/*/*/BBB, which represents all BBB elements with 3 ancestor elements
<AAA> <XXX> <DDD> <BBB/>这里 <BBB/>这里 <EEE/> <FFF/> </DDD> </XXX> <CCC> <DDD> <BBB/>这里 <BBB/>这里 <EEE/> <FFF/> </DDD> </CCC> <CCC> <BBB>这里 <BBB> <BBB/> </BBB> </BBB> </CCC></AAA>(3)
//*, which means selecting all elements
4. The expressions in square brackets can further specify the elements, where the numbers represent The position of the element in the selection set, and the last() function represents the last element in the selection set. It is important to note that the subscripts here start from 1, not 0! (1)
/AAA/BBB[1], which means selecting the first BBB sub-element of AAA
<AAA> <BBB/>这个 <BBB/> <BBB/> <BBB/></AAA>(2)
/AAA/ BBB[last()] means selecting the last BBB element of AAA
<AAA> <BBB/> <BBB/> <BBB/> <BBB/>这个</AAA>
5. Operations on attributes
(1)//@id, select all id attributes. Note: all id attributes are returned as nodes, not nodes with id attributes.
<AAA> <BBB id="b1"/>返回这里的id属性节点 <BBB id="b2"/>也返回这里的id属性节点 <BBB name="bbb"/> <BBB/></AAA>(2)
//BBB[@id], select all BBB nodes with id attributes
<AAA> <BBB id="b1"/>返回这个BBB节点 <BBB id="b2"/>也返回这个BBB节点 <BBB name="bbb"/> <BBB/></AAA>(3)
//BBB[@ name], select all BBB nodes with name attribute
<AAA> <BBB id="b1"/> <BBB id="b2"/> <BBB name="bbb"/>返回这个BBB节点 <BBB/></AAA>(4)
//BBB[@*], select all BBB nodes with attribute
<AAA> <BBB id="b1"/>返回这个BBB节点 <BBB id="b2"/>返回这个BBB节点 <BBB name="bbb"/>返回这个BBB节点 <BBB/></AAA>(5)
//BBB[not(@*)], select all BBB nodes without attributes
<AAA> <BBB id="b1"/> <BBB id="b2"/> <BBB name="bbb"/> <BBB/>这个</AAA>
6. The value of the attribute can be used As the selection criteria
(1)//BBB[@id='b1'], select the BBB element that contains the attribute id and its value is 'b1'
<AAA> <BBB id="b1"/>这个 <BBB name="bbb"/> <BBB name="bbb"/></AAA>
7.count()The function can count the number of selected elements
//* [count(BBB)=2], select the element containing 2 BBB sub-elements
<AAA> <CCC> <BBB/> <BBB/> <BBB/> </CCC> <DDD>返回这个元素 <BBB/> <BBB/> </DDD> <EEE> <CCC/> <DDD/> </EEE></AAA>(2)
//*[count(*)=2], select Elements containing 2 sub-elements
<AAA> <CCC> <BBB/> <BBB/> <BBB/> </CCC> <DDD>返回这个元素 <BBB/> <BBB/> </DDD> <EEE>也返回这个元素 <CCC/> <DDD/> </EEE></AAA>
There are many other syntaxes, including the application of many functions, which are not used much and will not be introduced here
<AAA> <BBB id="b1"> <CCC> <KKK>k1</KKK> </CCC> <CCC> <KKK>k2</KKK>这个 </CCC> </BBB> <BBB id="b2"/> <BBB name="bbb"/></AAA>If we now want to find the KKK sub-element of the second CCC sub-element under the first BBB sub-element under the AAA element element, the xpath path should be written like this:
/AAA/BBB[1]/CCC[2]/KKK
So in order for us to access a certain node more conveniently, we can use xpath technology, which allows us to read the specified node very conveniently.
The basic syntax of xpath has the following points:xpath is usually used in conjunction with dom4j, And if you want to use xpath, you need to introduce a new package jaxen-1.1-beta-6.jar
1. The basic xpath syntax is similar to locating files in a file system. If the path starts with a slash / starts, then the path represents the absolute path to an element.
/AAA, which represents the selection of the root element AAA
<AAA>这里 <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> <DDD/> <CCC/><AAA/>这里(2)
/AAA/CCC, indicating the selection of all CCC sub-elements of AAA
<AAA> <BBB/> <CCC/>这里 <BBB/> <BBB/> <DDD> <BBB/> <DDD/> <CCC/>这里<AAA/>(3)
/AAA/DDD/BBB, indicating the selection All BBB sub-elements of AAA's sub-elements DDD
<AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/>这里 <DDD/> <CCC/><AAA/>
那么怎么在dom4j中运用xpath呢?其实很简单:
//1.得到SAXReader解析器SAXReader saxReader = new SAXReader(); //2.指定去解析哪个文件Document document = saxReader.read(new File(path)); //3.可以使用xpath随心读取 // document.selectNodes(args)返回多个元素 // document.selectSingleNode(args)返回单个元素List nodes = document.selectNodes("/AAA/BBB");
通过dom4j得到document对象后,可以使用document的selectNodes(args)
方法,这个方法会根据你写的xpath路径返回一个List
,余下的操作就和dom4j类似了。
同时它也有一个selectSingleNode(args)
方法,用于返回一个单个的Node。
下面继续介绍其他的xpath语法:
2.如果路径以双斜线//
开头,则表示文档中所有满足双斜线//
之后规则的元素(无论层级关系)
(1)//BBB
,它表示选择所有BBB元素
<AAA> <BBB/>这里 <CCC/> <BBB/>这里 <DDD> <BBB/>这里 </DDD> <CCC> <DDD> <BBB/>这里 <BBB/>这里 </DDD> </CCC></AAA>
(2)//DDD/BBB
,表示所有父元素是DDD的BBB元素
<AAA> <BBB/> <CCC/> <BBB/> <DDD> <BBB/>这里 </DDD> <CCC> <DDD> <BBB/>这里 <BBB/>这里 </DDD> </CCC></AAA>
3.星号*
表示选择所有由星号之前路径所定位的元素
(1)/AAA/CCC/DDD/*
,它表示选择所有路径依附于/AAA/CCC/DDD的元素:
<AAA> <XXX> <DDD> <BBB/> <BBB/> <EEE/> <FFF/> </DDD> </XXX> <CCC> <DDD> <BBB/>这里 <BBB/>这里 <EEE/>这里 <FFF/>这里 </DDD> </CCC> <CCC> <BBB> <BBB> <BBB/> </BBB> </BBB> </CCC></AAA>
(2)/*/*/*/BBB
,它表示所有的有3个祖先元素的BBB元素
<AAA> <XXX> <DDD> <BBB/>这里 <BBB/>这里 <EEE/> <FFF/> </DDD> </XXX> <CCC> <DDD> <BBB/>这里 <BBB/>这里 <EEE/> <FFF/> </DDD> </CCC> <CCC> <BBB>这里 <BBB> <BBB/> </BBB> </BBB> </CCC></AAA>
(3)//*
,它表示选择所有的元素
4.方括号里的表达式可以进一步地指定元素,其中数字表示元素在选择集里的位置,而last()函数则表示选择集中的最后一个元素。特别要注意的是这里的下标是从1开始的,而不是0!
(1)/AAA/BBB[1]
,它表示选择AAA的第一个BBB子元素
<AAA> <BBB/>这个 <BBB/> <BBB/> <BBB/></AAA>
(2)/AAA/BBB[last()]
,表示选择AAA的最后一个BBB元素
<AAA> <BBB/> <BBB/> <BBB/> <BBB/>这个</AAA>
5.对属性的操作
(1)//@id
,选择所有的id属性,注意:是把所有的id属性当做节点返回,而不是返回有id属性的节点。
<AAA> <BBB id="b1"/>返回这里的id属性节点 <BBB id="b2"/>也返回这里的id属性节点 <BBB name="bbb"/> <BBB/></AAA>
(2)//BBB[@id]
,选择所有有id属性的BBB节点
<AAA> <BBB id="b1"/>返回这个BBB节点 <BBB id="b2"/>也返回这个BBB节点 <BBB name="bbb"/> <BBB/></AAA>
(3)//BBB[@name]
,选择所有有name属性的BBB节点
<AAA> <BBB id="b1"/> <BBB id="b2"/> <BBB name="bbb"/>返回这个BBB节点 <BBB/></AAA>
(4)//BBB[@*]
,选择所有有属性的BBB节点
<AAA> <BBB id="b1"/>返回这个BBB节点 <BBB id="b2"/>返回这个BBB节点 <BBB name="bbb"/>返回这个BBB节点 <BBB/></AAA>
(5)//BBB[not(@*)]
,选择所有没有属性的BBB节点
<AAA> <BBB id="b1"/> <BBB id="b2"/> <BBB name="bbb"/> <BBB/>这个</AAA>
6.属性的值可以被用来作为选择的准则
(1)//BBB[@id='b1']
,选择含有属性id且其值为’b1’的BBB元素
<AAA> <BBB id="b1"/>这个 <BBB name="bbb"/> <BBB name="bbb"/></AAA>
7.count()
函数可以计数所选元素的个数
(1)//*[count(BBB)=2]
,选择含有2个BBB子元素的元素
<AAA> <CCC> <BBB/> <BBB/> <BBB/> </CCC> <DDD>返回这个元素 <BBB/> <BBB/> </DDD> <EEE> <CCC/> <DDD/> </EEE></AAA>
(2)//*[count(*)=2]
,选择含有2个子元素的元素
<AAA> <CCC> <BBB/> <BBB/> <BBB/> </CCC> <DDD>返回这个元素 <BBB/> <BBB/> </DDD> <EEE>也返回这个元素 <CCC/> <DDD/> </EEE></AAA>
还有很多其他的语法,包括很多函数的应用,用的不多,这里不做介绍
另外,上述介绍的几点语法可以任意组合,比如下述的xml文档:
<AAA> <BBB id="b1"> <CCC> <KKK>k1</KKK> </CCC> <CCC> <KKK>k2</KKK>这个 </CCC> </BBB> <BBB id="b2"/> <BBB name="bbb"/></AAA>
假如我们现在要找AAA元素下面的第1个BBB子元素下面的第2CCC子元素的KKK子元素,则xpath路径应该这么写: /AAA/BBB[1]/CCC[2]/KKK
以上就是XML——XPATH语法介绍 的内容,更多相关内容请关注PHP中文网(www.php.cn)!