python - lxml中xpath获取本节点及以下所有节点的text()方法？

Question

{代码...} 以上是网页源码，现通过xpath匹配所有Li {代码...} 输出结果： {代码...} 如果换成： {代码...} 输出结果： {代码...} 显然，这不是我想要的结果，我想要的结果是这样 {代码...} 求大神指导~ 该如何实现？

怪我咯 · Answer

The correct way is to use Xpath’s string function

import lxml.etree as etree
html = """

 商品名称：养生堂天然维生素E软胶囊
 商品编号：720135
 品牌：养生堂

"""
tree = etree.HTML(html)
property_list_reg = '//ul[@id="parameter2"]//li'
property_lst = tree.xpath(property_list_reg)
for e in property_lst:
    print(e.xpath('string(.)'))
print(len(property_lst))

大家讲道理 · Answer

In the definition of XPath, "//" is "/descendant-or-self::node()/", indicating the matching of the current node or descendant node, so you use "//text()" to confirm It will match the child nodes below. A clearer way is to match the li layer of the parent node and then manually process the child nodes.

html = """

 商品名称：养生堂天然维生素E软胶囊
 商品编号：720135
 品牌：养生堂

"""

html = html.decode("utf-8")
tree = etree.HTML(html)

property_list_reg = '//ul[@id="parameter2"]/li'


def tryFindChild(element):
    children = element.getchildren()
    if len(children):
        return element.text + " " + children[0].text
    else:
        return element.text


property_lst = tree.xpath(property_list_reg)
for e in property_lst:
    print tryFindChild(e)

print len(property_lst)

Output
Product name: Yangshengtang Natural Vitamin E Soft Capsule
Product number: 720135
Brand: Yangshengtang
3

python - lxml中xpath获取本节点及以下所有节点的text()方法？

reply all(2)I'll reply