Home  >  Article  >  Backend Development  >  How to extract \"href\" attributes using BeautifulSoup in Python?

How to extract \"href\" attributes using BeautifulSoup in Python?

DDD
DDDOriginal
2024-10-28 21:42:02116browse

How to extract

Extracting HREF Attribute with BeautifulSoup

In this scenario, you want to extract the "some_url" href attribute from the following HTML content:

<code class="html"><a href="some_url">next</a>
<span class="class">...</span></code>

Utilizing BeautifulSoup's find_all() Method

To retrieve this specific attribute, employ the find_all() method as follows:

<code class="python">from bs4 import BeautifulSoup

html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    print("Found the URL:", a['href'])</code>

Python 2 to Python 3 Compatibility

Note that this code works for both Python 2 and Python 3. However, in older versions of BeautifulSoup (prior to version 4), the find_all() method was named findAll.

Retrieving All Tags with HREF Attributes

If you desire to retrieve all tags that possess an href attribute, regardless of their tag name, simply omit the tag name parameter:

<code class="python">href_tags = soup.find_all(href=True)</code>

The above is the detailed content of How to extract \"href\" attributes using BeautifulSoup in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn