For a fun web scraping project, I want to collect NHL data from ttps://www.nhl.com/stats/teams.
There is a clickable Excel export tag and I can find it using selenium
and bs4
.
Unfortunately, things end here:
I can't seem to access the data since there is no href
attribute.
I got what I wanted by simulating a mouse click using pynput
, but I want to know:
What could I have done differently? If it feels awkward.
-> Labels with export icons can be found here:
a class="styles__ExportIcon-sc-16o6kz0-0 dIDMgQ"
-> This is my code
`import pynput from pynput.mouse import Button, Controller import time from bs4 import BeautifulSoup from selenium import webdriver driver = webdriver.Chrome(executable_path = 'somepath\chromedriver.exe') URL = 'https://www.nhl.com/stats/teams' driver.get(URL) html = driver.page_source # DOM with JavaScript execution complete soup = BeautifulSoup(html) body = soup.find('body') print(body.prettify()) mouse = Controller() time.sleep(5) # Sleep for 5 seconds until page is loaded mouse.position = (1204, 669) # thats where the icon is on my screen mouse.click(Button.left, 1) # executes download`
P粉8074716042024-04-05 00:51:05
There is no href
attribute, and the download is triggered through JS. When using selenium
find your element and use .click()
to download the file:
driver.find_element(By.CSS_SELECTOR,'h2>a').click()
Use the css selector here
to get the <a>
or by ending with # The class starting with ##styles__ExportIcon of direct children
select it directly:
driver.find_element(By.CSS_SELECTOR,'a[class^="styles__ExportIcon"]').click()Example
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.service import Service from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) url = 'https://www.nhl.com/stats/teams' driver.get(url) driver.find_element(By.CSS_SELECTOR,'#onetrust-reject-all-handler').click() driver.find_element(By.CSS_SELECTOR,'h2>a').click()