Home > Article > Backend Development > XPath expression usage in Python
XPath expression usage in Python
XPath is a language for navigating and finding in XML and HTML documents, and is widely used in data scraping , Web automated testing, text extraction and other fields. In Python, we can use the lxml library to parse XML and HTML documents and use XPath expressions to locate and extract the required data.
pip install lxml
from lxml import etree
parser = etree.HTMLParser()
tree = etree.parse('example.html', parser)
xpath_expr = '//a'
nodes = tree.xpath(xpath_expr)
texts = [node.text for node in nodes] print(texts)
The following is a complete sample code that demonstrates how to extract data from an HTML document Extract all links:
from lxml import etree parser = etree.HTMLParser() tree = etree.parse('example.html', parser) xpath_expr = '//a' nodes = tree.xpath(xpath_expr) links = [node.get('href') for node in nodes] print(links)
The above is the basic usage of XPath expressions in Python. By mastering XPath syntax and using the lxml library, we can easily parse and extract data from XML and HTML documents, providing a powerful tool for tasks such as data analysis and web crawling.
I hope this article can help you understand and use XPath expressions in Python. I wish you success in data processing and web development!
The above is the detailed content of XPath expression usage in Python. For more information, please follow other related articles on the PHP Chinese website!