Home >Web Front-end >JS Tutorial >How to Scrape Dynamic JavaScript-Rendered Content in Python?

How to Scrape Dynamic JavaScript-Rendered Content in Python?

DDD
DDDOriginal
2024-12-22 09:58:04422browse

How to Scrape Dynamic JavaScript-Rendered Content in Python?

How to Scrape Dynamic Content Generated by JavaScript in Python

Scraping dynamic content from web pages can pose challenges when using static methods like urllib2.urlopen(request) in Python. Such content is often generated and executed by JavaScript embedded within the page.

One approach to tackle this issue is to leverage the Selenium framework with Phantom JS as a web driver. Ensure that Phantom JS is installed, and its binary is available in the current path.

Here's an example to illustrate:

import requests
from bs4 import BeautifulSoup
response = requests.get(my_url)
soup = BeautifulSoup(response.text)
soup.find(id="intro-text") # Result: <p>

This code will retrieve the page without JavaScript support. To scrape with JS support, use Selenium:

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get(my_url)
p_element = driver.find_element_by_id(id_='intro-text')
print(p_element.text) # Result: 'Yay! Supports javascript'

Alternatively, you can utilize Python libraries specifically designed for scraping JavaScript-driven websites, such as dryscrape:

import dryscrape
from bs4 import BeautifulSoup
session = dryscrape.Session()
session.visit(my_url)
response = session.body()
soup = BeautifulSoup(response)
soup.find(id="intro-text") # Result: <p>

The above is the detailed content of How to Scrape Dynamic JavaScript-Rendered Content in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn