Home >Backend Development >Python Tutorial >Download PDF files using Python's Requests and BeautifulSoup
Request and BeautifulSoup are Python libraries that can download any file or PDF online. The requests library is used to send HTTP requests and receive responses. BeautifulSoup library is used to parse the HTML received in the response and get the downloadable pdf link. In this article, we will learn how to download PDF using Request and Beautiful Soup in Python.
Before using the BeautifulSoup and Request libraries in Python, we need to install these libraries in the system using the pip command. To install request and the BeautifulSoup and Request libraries, run the following commands in the terminal.
pip install requests pip install beautifulsoup4
To download a PDF from the internet, you need to first find the URL of the pdf file using the requests library. We can then use Beautiful Soup to parse the HTML response and extract the link to the PDF file. The base URL and the PDF link received after parsing are then combined to get the URL of the PDF file. Now we can use the request method to send a Get request to download the file.
In the code below, place the valid URL of the page containing the PDF file URL at "https://example.com/document.pdf"
import requests from bs4 import BeautifulSoup # Step 1: Fetch the PDF URL url = 'https://example.com/document.pdf' response = requests.get(url) if response.status_code == 200: # Step 2: Parse the HTML to get the PDF link soup = BeautifulSoup(response.text, 'html.parser') link = soup.find('a')['href'] # Step 3: Download the PDF pdf_url = url + link pdf_response = requests.get(pdf_url) if pdf_response.status_code == 200: with open('document.pdf', 'wb') as f: f.write(pdf_response.content) print('PDF downloaded successfully.') else: print('Error:', pdf_response.status_code) else: print('Error:', response.status_code)
PDF downloaded successfully.
In this article, we discussed how to download PDF files from the internet using the Request and Beautiful Soup libraries in Python. Through the request method, we can send an HTTP request to verify the PDF link. Once we find a page that contains a link to a PDF file, we can use Beautiful Soup Download to parse the page and get the PDF downloadable link.
The above is the detailed content of Download PDF files using Python's Requests and BeautifulSoup. For more information, please follow other related articles on the PHP Chinese website!