Home >Backend Development >Python Tutorial >How can I use XPath with BeautifulSoup?

How can I use XPath with BeautifulSoup?

Linda Hamilton
Linda HamiltonOriginal
2024-11-08 06:26:01728browse

How can I use XPath with BeautifulSoup?

Using XPath with BeautifulSoup

BeautifulSoup is a popular Python library for parsing and manipulating HTML documents. However, it does not natively support XPath expressions.

Alternative: lxml

An alternative library called lxml provides full XPath 1.0 support. It also has a BeautifulSoup compatible mode that can parse broken HTML like BeautifulSoup. To use XPath with lxml:

from lxml import etree
from urllib import request

url = "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = request.urlopen(url)
tree = etree.parse(response, etree.HTMLParser())
result_list = tree.xpath("/html/body/div/table/tbody/tr[1]/td[1]")

Using CSS Selectors with lxml

lxml also has CSSSelector support, which can translate CSS statements into XPath expressions. For example, to find td elements with the class empformbody:

from lxml.cssselect import CSSSelector

css_selector = CSSSelector('td.empformbody')
result_list = css_selector(tree)

CSS Selectors in BeautifulSoup

Interestingly, BeautifulSoup has its own CSS selector support:

soup = BeautifulSoup(html, "html.parser")
result_list = soup.select('table#foobar td.empformbody')

The above is the detailed content of How can I use XPath with BeautifulSoup?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn