search

Home  >  Q&A  >  body text

Web scraping: Missing href attribute - Need to simulate mouse clicks for web scraping?

For a fun web scraping project, I want to collect NHL data from ttps://www.nhl.com/stats/teams.

There is a clickable Excel export tag and I can find it using selenium and bs4.

Unfortunately, things end here: I can't seem to access the data since there is no href attribute.

I got what I wanted by simulating a mouse click using pynput, but I want to know:

What could I have done differently? If it feels awkward.

-> Labels with export icons can be found here:

a class="styles__ExportIcon-sc-16o6kz0-0 dIDMgQ"

-> This is my code

`import pynput
from pynput.mouse import Button, Controller
import time

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome(executable_path = 'somepath\chromedriver.exe')

URL = 'https://www.nhl.com/stats/teams'

driver.get(URL)
html = driver.page_source  # DOM with JavaScript execution complete
soup = BeautifulSoup(html)
body = soup.find('body')
print(body.prettify())


mouse = Controller()

time.sleep(5) # Sleep for 5 seconds until page is loaded
mouse.position = (1204, 669) # thats where the icon is on my screen
mouse.click(Button.left, 1) # executes download`

P粉550823577P粉550823577259 days ago3650

reply all(1)I'll reply

  • P粉807471604

    P粉8074716042024-04-05 00:51:05

    There is no href attribute, and the download is triggered through JS. When using selenium find your element and use .click() to download the file:

    driver.find_element(By.CSS_SELECTOR,'h2>a').click()

    Use the css selector here to get the <a> of direct children

    or by ending with # The class starting with ##styles__ExportIcon select it directly:
    driver.find_element(By.CSS_SELECTOR,'a[class^="styles__ExportIcon"]').click()

    Example

    You may need to deal with the onetrust banner, so click on it first and then download the table.

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.chrome.service import Service
    from webdriver_manager.chrome import ChromeDriverManager
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    
    url = 'https://www.nhl.com/stats/teams'
    driver.get(url)
    driver.find_element(By.CSS_SELECTOR,'#onetrust-reject-all-handler').click()
    driver.find_element(By.CSS_SELECTOR,'h2>a').click()

    reply
    0
  • Cancelreply