Home >Backend Development >Python Tutorial >How to extract \'href\' attributes using BeautifulSoup in Python?
Extracting HREF Attribute with BeautifulSoup
In this scenario, you want to extract the "some_url" href attribute from the following HTML content:
<code class="html"><a href="some_url">next</a> <span class="class">...</span></code>
Utilizing BeautifulSoup's find_all() Method
To retrieve this specific attribute, employ the find_all() method as follows:
<code class="python">from bs4 import BeautifulSoup html = '''<a href="some_url">next</a> <span class="class"><a href="another_url">later</a></span>''' soup = BeautifulSoup(html) for a in soup.find_all('a', href=True): print("Found the URL:", a['href'])</code>
Python 2 to Python 3 Compatibility
Note that this code works for both Python 2 and Python 3. However, in older versions of BeautifulSoup (prior to version 4), the find_all() method was named findAll.
Retrieving All Tags with HREF Attributes
If you desire to retrieve all tags that possess an href attribute, regardless of their tag name, simply omit the tag name parameter:
<code class="python">href_tags = soup.find_all(href=True)</code>
The above is the detailed content of How to extract \'href\' attributes using BeautifulSoup in Python?. For more information, please follow other related articles on the PHP Chinese website!