Home > Article > Web Front-end > XML study notes 6??XPath language_html/css_WEB-ITnose
At the end of the previous note, we came across two elements dd0a2aa35000963cdbe79dabbf7a031f and d737d0a06790df603d79bb2c76c86712 that are used to select a specific range in an XML document. The values of these two elements are XPath expressions. So, what is XPath? Simply put, XPath is a language used to find information in XML documents. It can be used to traverse elements and attributes in XML documents. Many XML-related technologies such as XSLT, XQuery, XPointer, etc. are built on the basis of XPath. In this note, let’s learn the XPath language.
1. Related terms
(1) Node: Well-formatted XML documents can be converted into a tree structure. The nodes in XPath are also in this tree structure. node. In summary, there are seven types of nodes listed below:
节点类型 | 说明 |
XML文档根节点 | XML文档的根称为文档节点或根节点 |
元素节点 | 一个元素的开始标签、结束标签,以及之间的全部内容整体称之为元素节点 |
属性节点 | 元素的每个属性都构成一个属性节点,包括属性名称和属性值两个部分,属性节点必须依附于元素节点 |
命名空间节点 | XML文档中的xmlns:prefix属性称之为命名空间节点,注意和属性节点的区别 |
文本节点 | XML元素中间的字符数据,包括CDATA段中的字符数据 |
注释节点 | XML文档里f69bf4a9fcda7e39ab7680cbdd8ba73f包含的注释部分构成注释节点 |
处理指令节点 | XML文档的处理指令部分构成处理指令节点 |
(2) Basic value (also called atomic value, Atomic Value): specially used to represent simple literal values, such as integer values, String etc. The basic value can be regarded as a node with no parent node and no child nodes.
(3) Item: An item represents a basic value or a node.
(4) Node set and sequence (Sequence): XPath expressions can represent multiple nodes. The combination of multiple nodes is called a node set in XPath1.0, and a new node was added in XPath2.0 The term sequence can represent either an ordinary item or a set of nodes.
(5) Node relationship:
节点关系 | 说明 |
父节点Parent | 每个元素或属性都有一个父节点 |
子节点Children | 元素节点可以有0个、1个或多个子节点 |
兄弟节点Sibling | 父节点相同的节点称之为兄弟节点 |
祖先节点Ancestor | 节点的父节点、父节点的父节点一直到根节点 |
后代节点Descendant | 节点的子节点,子节点的子节点...... |
(6) Relative paths and absolute paths: Similar to paths in the operating system, there are relative paths and absolute paths in XPath. Absolute paths start with a slash (/) and always start matching from the root node, while relative paths It will not start with a slash and will start matching from the current path.
2. XPath syntax
XPath uses path expressions to access nodes or node sets in XML. Each XPath expression always consists of one or more steps. Multiple steps are separated directly by slashes. In XPath, the syntax format of a step is as follows:
轴::节点测试[限定谓语]
In other words, each step passes three screenings, the first time is to use "axis" to select the node direction , the second time uses "node test" to select some nodes in the specified axis direction, and the third time uses "qualified predicate" to further filter the selected nodes.
(1) Axis: In XPath, there are various axes in the following list:
ancestor | 选取当前节点的所有先辈(父、祖父等)节点 | |
ancestor-or-self | 选取当前节点的所有先辈(父、祖父等)节点以及当前节点本身 | |
attribute | @ | 选取当前节点的所有属性节点,如果当前节点不是元素节点,则attribute轴方向上的节点集为空 |
child | 省略不写 | 选取当前节点的所有子节点 |
descendant | // | 选取当前节点的所有后代节点(子、孙等) |
descendant-or-self | 选取当前节点的所有后代节点(子、孙等)以及当前节点本身 | |
following | 选取文档中当前节点的结束标签之后的所有节点,不会包含当前节点的后代节点和属性节点 | |
following-sibling | 选取文档中当前节点的结束标签之后的所有兄弟节点 | |
namespace | 选取当前节点的所有命名空间节点,当前节点不是元素节点,则namespace轴方向上的节点集为空 | |
parent | .. | 选取当前节点的父节点 |
preceding | 选取文档中当前节点的开始标签之前的所有节点,不会包含当前节点的后代节点和属性节点 | |
preceding-sibling | 选取文档中当前节点的开始标签之前的所有兄弟节点 | |
self | . | 选取当前节点 |
(2) Node test: Node test is used to select all axis directions from the specified axis. Match specific nodes. In XPath, the commonly used node test syntax is as shown in the following table:
Node test | Instructions | ||||||||||||||
nodename |
| ||||||||||||||
node() | Select all types of nodes matching the specified axis|||||||||||||||
text() | Select all text type nodes matching the specified axis For example, child::text() selects all text child nodes of the current node descendant::text() selects all text descendant nodes of the current node (including text child nodes, text grandchild nodes, etc.) | ||||||||||||||
comment() | Select all comment nodes matching the specified axis | ||||||||||||||
processing -instruction | Select all processing instruction nodes matching the specified axis | ||||||||||||||
* | Wildcard in node test, indicating all, also That is, no filtering |
(3) Qualifying predicate: Each step can accept 0 or more qualifying predicates, which are used to further filter the selected node set. The qualifying predicate is placed in square brackets. Usually the qualifying predicate returns a boolean value. Take a look at some examples below:
/bookstore/book[1] | 选取属于 bookstore 元素的子元素的第一个 book 元素。 |
/bookstore/book[last()] | 选取属于 bookstore 元素的子元素的最后一个 book 元素。 |
/bookstore/book[last()-1] | 选取属于 bookstore 元素的子元素的倒数第二个 book 元素。 |
/bookstore/book[position()54d348734544c121a9a554bfdcd1764135.00] | 选取 bookstore 元素的所有 book 元素,且其中的 price 元素的值须大于 35.00。 |
/bookstore/book[price>35.00]/title | 选取 bookstore 元素中的 book 元素的所有 title 元素,且其中的 price 元素的值须大于 35.00。 |
3、运算符
从上面的实例中可以看到,在限定谓语中,还可以使用运算符、表达式,还有很多内置的函数供使用。这一小节先看看XPath中支持的运算符:
(1)算术运算符:加(+)、减(-)、乘(*)、除(div)、取模(mod)
算术运算符非常简单,但是需要注意几点:
A、因为减号实际上也就是中划线,而中划线在XML中是合法的标识符号,从而带来了歧义,于是XPath强制规定,使用减号的时候,需要前后各加一个空格。
B、在XPath中,所有的数值都是64位的double类型,即便直接书写成0、100;另外,XPath还有几个特殊的数值:正无穷大、负无穷大、非数。
C、在运算时,如果操作数不是数值类型,会自动转换,下面的比较运算符、逻辑运算符如果有必要也会发生相应的自动类型转换。
(2)比较运算符:等于(=)、不等于(!=)、小于(f423d270d776d8190d587c2317b29450)、大于或等于(>=)
需要注意的是,不像其它编程语言,这里表示相等只需要一个等于号。
(3)逻辑运算符:与(and)、或(or)
(4)集合运算符:并集(|)
4、表达式
(1)for表达式:用于循环访问序列中的每个项,并对每项进行一次计算,最后将每项计算得到的结果组合成序列后返回,语法格式如下:
for $var in sequence return rtExpression
实际上,这里的for更类似于js中的foreach。还可以使用下面的形式遍历多个序列:
for $var1 in sequence1, $var2 in sequence2 return fn($var1,$var2)
(2)if表达式:用于处理分支,根据不同条件得到不同的返回值,语法格式如下:
if (condition1)then rtVal1[else if (condition2)then rtVal2...]elseotherVal
(3)some表达式:迭代中只要有一项满足条件就返回true,否则返回false,语法格式如下:
some $var in sequence satisfies condition
(4)every表达式:迭代中只有有一项不满足条件就返回false,否则返回true,语法格式如下:
every $var in sequence satisfies condition
5、内置函数库
在XPath中还有大量的内置函数,用于增强相关功能,这些内置函数可以参考:XPath函数。我在下面也抄录一份供参考:
分类 | 函数 | 说明 | |
存取函数 | fn:node-name(node) | 返回参数节点的节点名称。 | |
fn:nilled(node) | 返回是否拒绝参数节点的布尔值。 | ||
fn:data(item.item,...) | 接受项目序列,并返回原子值序列。 | ||
fn:base-uri() fn:base-uri(node) | 返回当前节点或指定节点的 base-uri 属性的值。 | ||
fn:document-uri() fn:document-uri(node) | 返回当前节点或指定节点的 document-uri 属性的值。 | ||
错误和跟踪函数 | fn:error() fn:error(error) fn:error(error,description) fn:error(error,description,error-object) | 例子:error(fn:QName('http://example.com/test', 'err:toohigh'), 'Error: Price is too high') 结果:向外部处理环境返回 http://example.com/test#toohigh 以及字符串 "Error: Price is too high"。 | |
fn:trace(value,label) | 用于对查询进行 debug。 | ||
数值函数 | fn:number(arg) | 返回参数的数值。参数可以是布尔值、字符串或节点集。 例子:number('100') 结果:100 | |
fn:abs(num) | 返回参数的绝对值。 | ||
fn:ceiling(num) | 返回大于或等于 num 参数的最小整数。 | ||
fn:floor(num) | 返回小于或等于 num 参数的最大整数。 | ||
fn:round(num) | 把 num 参数四舍五入为最接近的整数。 | ||
fn:round-half-to-even() | 返回最接近参数num的偶数 例子:round-half-to-even(0.5) 结果:0 例子:round-half-to-even(1.5) 结果:2 例子:round-half-to-even(2.5) 结果:2 | ||
字符串函数 | fn:string(arg) | 返回参数的字符串值。参数可以是数字、逻辑值或节点集。 | |
fn:codepoints-to-string(int,int,...) | 根据一个Unicode值序列序列返回字符串。 例子:codepoints-to-string((84, 104, 233, 114, 232, 115, 101)) 结果:'Thérèse' Note: The parameter of this function is a sequence of Unicode values, so the parameter must be enclosed in parentheses | ||
fn:string-to-codepoints(string) | Returns the sequence of Unicode values corresponding to each character based on the string. | ||
fn:codepoint-equal(comp1,comp2) | According to Unicode value sequence comparison, if the value of comp1 is equal to the value of comp2, return true, otherwise return false . | ||
fn:compare(comp1,comp2) fn:compare(comp1,comp2,collation) | According to the comparison rules, if comp1 is less than comp2, return -1; When comp1 is equal to comp2, 0 is returned; when comp1 is greater than comp2, 1 is returned. | ||
fn:concat(string,string,...) | Returns the concatenation of strings. | ||
fn:string-join((string,string,...),sep) | Use the sep parameter as the separator to return the concatenated string parameters. String. | ||
fn:substring(string,start,len) fn:substring(string,start) | Returns the sub-character of the specified length starting from the start position string. The index of the first character is 1. If the len argument is omitted, returns the substring from position start to the end of the string. | ||
fn:string-length(string) fn:string-length() | Returns the length of the specified string. If there is no string parameter, the length of the string value of the current node is returned | ||
fn:normalize-space(string) fn:normalize-space() | Delete the specification The leading and trailing blanks of the string are compressed into one, and the result is returned. Without parameters, the current node is processed. | ||
fn:normalize-unicode() | Perform Unicode normalization. | ||
fn:upper-case(string) | Convert the string parameter to uppercase. | ||
fn:lower-case(string) | Convert the string parameter to lowercase. | ||
fn:translate(string1,string2,string3) | Replace string2 in string1 with string3. Example: translate('12:30','30','45') Result: '12:45' Example: translate('12: 30','03','54') Result: '12:45' Example: translate('12:30','0123','abcd') Result: 'bc:da' | ||
fn:escape-uri(stringURI,esc-res) | Example: escape -uri("http://example.com/test#car", true()) Result: "http://example.com/test#car" Example: escape-uri("http://example.com/test#car", false()) Result: http://example.com/test#car Example: escape -uri ("http://example.com/~bébé", false()) Result: "http://example.com/~bébé" | ||
fn:contains(string1,string2) | If string1 contains string2, return true, otherwise return false. | ||
fn:starts-with(string1,string2) | Returns true if string1 starts with string2, otherwise returns false. | ||
fn:ends-with(string1,string2) | Returns true if string1 ends with string2, otherwise returns false. | ||
fn:substring-before(string1,string2) | Returns the substring of string2 before it appears in string1. | ||
fn:substring-after(string1,string2) | Returns the substring after string2 appears in string1. | ||
fn:matches(string,pattern) | Returns true if the string parameter matches the specified pattern, otherwise returns false. | ||
fn:replace(string,pattern,replace) | Replace the specified pattern with the replace parameter and return the result. | ||
fn:tokenize(string,pattern) | Example: tokenize("XPath is fun", "s ") Result ("XPath", "is", "fun") | ||
anyURI function | fn:resolve-uri(relative,base) | ||
Logical function | fn:boolean(arg) | Returns a Boolean value as a number, string, or node set. | |
fn:not(arg) | First, use the boolean() function to restore the parameter to a Boolean value, and then negate it. | ||
fn:true() | Returns the Boolean value true. | ||
fn:false() | Returns the Boolean value false. | ||
Date time function | fn:dateTime(date,time) | Convert parameters into date and time. | |
fn:years-from-duration(datetimedur) | Returns the integer of the year part of the parameter value, expressed in standard lexical notation. | ||
fn:months-from-duration(datetimedur) | Returns the integer of the month part of the parameter value, expressed in standard lexical notation. | ||
fn:days-from-duration(datetimedur) | Returns the integer of the day part of the parameter value, expressed in standard lexical notation. | ||
fn:hours-from-duration(datetimedur) | Returns the hour part of the parameter value as an integer, expressed in standard lexical notation. | ||
fn:minutes-from-duration(datetimedur) | Returns the minutes portion of the parameter value as an integer, expressed in standard lexical notation. | ||
fn:seconds-from-duration(datetimedur) | Returns the decimal number of the minutes part of the parameter value, expressed in standard vocabulary notation. | ||
fn:year-from-dateTime(datetime) | Returns the integer of the year part of the parameter local value. | ||
fn:month-from-dateTime(datetime) | Returns the integer of the month part of the parameter local value. | ||
fn:day-from-dateTime(datetime) | Returns the integer of the day part of the parameter local value. | ||
fn:hours-from-dateTime(datetime) | Returns the hour part of the parameter's local value as an integer. | ||
fn:minutes-from-dateTime(datetime) | Returns the minutes part of the argument's local value as an integer. | ||
fn:seconds-from-dateTime(datetime) | Returns the decimal number of the seconds part of the parameter's local value. | ||
fn:timezone-from-dateTime(datetime) | Returns the time zone part of the parameter, if present. | ||
fn:year-from-date(date) | Returns the integer representing the year in the local value of the parameter. | ||
fn:month-from-date(date) | Returns the integer representing the month in the local value of the parameter. | ||
fn:day-from-date(date) | Returns the integer representing the day in the local value of the parameter. | ||
fn:timezone-from-date(date) | Returns the time zone part of the parameter, if present. | ||
fn:hours-from-time(time) | Returns the integer representing the hour part in the local value of the parameter. | ||
fn:minutes-from-time(time) | Returns the integer representing the minutes part of the parameter local value. | ||
fn:seconds-from-time(time) | Returns the integer representing the seconds part in the local value of the parameter. | ||
fn:timezone-from-time(time) | Returns the time zone part of the parameter, if present. | ||
fn:adjust-dateTime-to-timezone(datetime,timezone) | If the timezone parameter is empty, return dateTime without time zone. Otherwise a dateTime with time zone is returned. | ||
fn:adjust-date-to-timezone(date,timezone) | If the timezone parameter is empty, return date without time zone. Otherwise a date with time zone is returned. | ||
fn:adjust-time-to-timezone(time,timezone) | If the timezone parameter is empty, time without time zone is returned. Otherwise return time with time zone. | ||
QName related functions | fn:QName() | ||
fn:local-name- from-QName() | |||
fn:namespace-uri-from-QName() | |||
fn:namespace-uri-for-prefix() | |||
fn:in-scope-prefixes() | |||
fn:resolve-QName() | |||
Node function | fn:name() fn:name( nodeset) | Returns the name of the current node or the first node in the specified node set. | |
fn:local-name() fn:local-name(nodeset) | Returns the name of the current node or the first node in the specified node set - without There is a namespace prefix. | ||
fn:namespace-uri() fn:namespace-uri(nodeset) | Returns the namespace URI of the current node or the first node in the specified node set. | ||
fn:lang(lang) | Returns true if the language of the current node matches the specified language. Example: Lang("en") is true for 4c298814c26027550e0ede59a559b0ac...94b3e26ee717c64999d7867364b1b4a3 Example: Lang("de" ) is false for 4c298814c26027550e0ede59a559b0ac...94b3e26ee717c64999d7867364b1b4a3 | ||
fn:root() fn:root(node) | Returns the root node of the node tree to which the current node or the specified node belongs. Usually a document node. | ||
Context function | fn:position() | Returns the index position of the node currently being processed. Example: //book[position()<=3] Result: Select the first three book elements | |
fn :last() | Returns the number of items in the list of nodes being processed. Example: //book[last()] Result: Select the last book element | ||
fn:current-dateTime( ) | Returns the current dateTime (with time zone). | ||
fn:current-date() | Returns the current date (with time zone). | ||
fn:current-time() | Returns the current time (with time zone). | ||
fn:implicit-timezone() | Returns the value of the implicit time zone. | ||
fn:default-collation() | Returns the value of the default collation. | ||
fn:static-base-uri() | Return the value of base-uri. | ||
Sequence function | General sequence function | fn:index-of((item,item,...),searchitem) | Returns the position in the item sequence equal to the searchitem parameter. Example: index-of ((15, 40, 25, 40, 10), 40) Result: (2, 4) |
fn:remove((item,item,...),position) | Returns a new sequence constructed by the item parameter - while removing the item specified by the position parameter. | ||
fn:empty(item,item,...) | If the parameter value is an empty sequence, return true, otherwise return false. | ||
fn:exists(item,item,...) | If the parameter value is not an empty sequence, return true, otherwise return false. | ||
fn:distinct-values((item,item,...),collation) | Returns unique distinct values. Example: distinct-values((1, 2, 3, 1, 2)) Result: (1, 2, 3) | ||
fn:insert-before((item,item,...),pos,inserts) | Returns a new sequence constructed by the item parameter - while pos The value of the inserts parameter is inserted at the position specified by the parameter. | ||
fn:reverse((item,item,...)) | Returns the specified item in reverse order. | ||
fn:subsequence((item,item,...),start,len) | Returns the item sequence at the position specified by the start parameter. The length of the sequence is len parameter specified. The first item's position is 1. | ||
fn:unordered((item,item,...)) | Return items in an order determined by the implementation. | ||
Capacity test function | fn:zero-or-one(item,item,...) | If the parameter contains zero or one project, the parameter is returned, otherwise an error is generated. | |
fn:one-or-more(item,item,...) | If the parameter contains one or more items, return the parameter, otherwise generate an error . | ||
fn:exactly-one(item,item,...) | Returns the parameter if it contains an item, otherwise generates an error. | ||
Comparison function | fn:deep-equal(param1,param2,collation) | If param1 and param2 are equal to each other (deep-equal) , returns true, otherwise returns false. | |
Total function | fn:count((item,item,...)) | Returns the number of nodes. | |
fn:avg((arg,arg,...)) | Returns the average of parameter values. | ||
fn:max((arg,arg,...)) | Returns the maximum value in the parameters. | ||
fn:min((arg,arg,...)) | Returns the minimum value in the parameters. | ||
fn:sum(arg,arg,...) | Returns the sum of the values of each node in the specified node set. | ||
Sequence generation function | fn:id((string,string,...),node) | ||
fn:idref((string,string,...),node) | |||
fn:data((item1,item2,.. .)) | Returns a sequence consisting of the values of item1, item2, etc. | ||
fn:doc(URI) | |||
fn:doc-available(URI) | Returns true if the doc() function returns a document node, false otherwise. | ||
fn:collection() fn:collection(string) |
Obviously, put these built-in functions in The purpose here is not to force memorization, but just to look it up in the dictionary when needed (XQuery 1.0 and XPath share these built-in functions, it’s good to have a look if you have something to do, and it’s familiar).