Home >Backend Development >Python Tutorial >How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?

How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?

Patricia Arquette
Patricia ArquetteOriginal
2024-12-08 00:12:111040browse

How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?

Retrieving Web Page Links with Python and BeautifulSoup

Question: How do I extract the hyperlinks from a webpage and obtain their URLs using Python?

Answer:

To efficiently extract the links and URL addresses from a webpage using Python and BeautifulSoup, you can utilize the SoupStrainer class. Here's a code snippet:

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, 'html.parser', parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

This code first fetches the HTML content of a webpage (using the httplib2 library). Then, it employs BeautifulSoup to parse the HTML, filtering only for a tags using the SoupStrainer class for better efficiency. Finally, it iterates over the a tags and prints the href attribute of each, effectively extracting the link URLs.

Refer to the BeautifulSoup documentation for more detailed information on various parsing scenarios:

[BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

The above is the detailed content of How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn