Home  >  Article  >  Backend Development  >  How to scrape javascript website with Python?

How to scrape javascript website with Python?

WBOY
WBOYforward
2024-02-10 15:40:041129browse

如何用 Python 抓取 javascript 网站?

Question content

I am trying to crawl a website. I've tried using both methods, but neither gives me the full website source code I'm looking for. I am trying to scrape news headlines from the website url provided below.

Website: "https://www.todayonline.com/"

Here are the two methods I tried and failed.

Method One: Beautiful Soup

tdy_url = "https://www.todayonline.com/"
page = requests.get(tdy_url).text
soup = beautifulsoup(page)
soup  # returns me a html with javascript text
soup.find_all('h3')

### returns me empty list []

Method 2: selenium beautifulsoup

tdy_url = "https://www.todayonline.com/"

options = Options()
options.headless = True

driver = webdriver.Chrome("chromedriver",options=options)

driver.get(tdy_url)
time.sleep(10)
html = driver.page_source

soup = BeautifulSoup(html)
soup.find_all('h3')

### Returns me only less than 1/4 of the 'h3' tags found in the original page source

please help. I've tried scraping other news sites and this is much easier. Thanks.


Correct answer


You can access the data through the api (look at the Network tab):

For example,

import requests
url = "https://www.todayonline.com/api/v3/news_feed/7"
data = requests.get(url).json()

The above is the detailed content of How to scrape javascript website with Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete