Home >Backend Development >Python Tutorial >How Can I Extract Hyperlinks from a Webpage Using Python and BeautifulSoup?

How Can I Extract Hyperlinks from a Webpage Using Python and BeautifulSoup?

Linda Hamilton
Linda HamiltonOriginal
2024-12-11 11:06:10640browse

How Can I Extract Hyperlinks from a Webpage Using Python and BeautifulSoup?

Retrieving Links from Web Pages with Python and BeautifulSoup

This article demonstrates how to retrieve the links from a web page and gather their URL addresses using Python and the BeautifulSoup library.

Problem:

How do you extract the URLs of links embedded in a webpage using Python?

Solution:

To achieve this, you can utilize the SoupStrainer class provided by BeautifulSoup. The following code snippet exemplifies the process:

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, 'html.parser', parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

This code establishes a connection to a specified webpage, namely 'http://www.nytimes.com' in the example. Using BeautifulSoup, it parses the HTML response and applies the SoupStrainer('a') filter, which focuses on 'a' tags (representing links) within the page. For each link found, the code retrieves its 'href' attribute, which contains the actual URL address.

The above is the detailed content of How Can I Extract Hyperlinks from a Webpage Using Python and BeautifulSoup?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn