Home >Backend Development >Python Tutorial >How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?

How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?

Patricia ArquetteOriginal: 2024-12-08 00:12:111067browse

Retrieving Web Page Links with Python and BeautifulSoup

Question: How do I extract the hyperlinks from a webpage and obtain their URLs using Python?

Answer:

To efficiently extract the links and URL addresses from a webpage using Python and BeautifulSoup, you can utilize the SoupStrainer class. Here's a code snippet:

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, 'html.parser', parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

This code first fetches the HTML content of a webpage (using the httplib2 library). Then, it employs BeautifulSoup to parse the HTML, filtering only for a tags using the SoupStrainer class for better efficiency. Finally, it iterates over the a tags and prints the href attribute of each, effectively extracting the link URLs.

Refer to the BeautifulSoup documentation for more detailed information on various parsing scenarios:

[BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

The above is the detailed content of How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?. For more information, please follow other related articles on the PHP Chinese website!

Python html beautifulsoup for using class finally Attribute this href https

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：How to Sort String Numbers Numerically in Python?Next article：How to Sort String Numbers Numerically in Python?

See more

How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?

Related articles