Home >Backend Development >Python Tutorial >How to Extract Hidden Information from #shadow-roots Using Selenium Python?

How to Extract Hidden Information from #shadow-roots Using Selenium Python?

Patricia Arquette
Patricia ArquetteOriginal
2024-10-19 06:44:01423browse

How to Extract Hidden Information from #shadow-roots Using Selenium Python?

Extracting Information from a #shadow-root using Selenium Python

In the realm of web scraping, extracting data from elements concealed within #shadow-roots can pose a significant challenge. This article explores the techniques to overcome this obstacle using Selenium Python.

Problem:

Consider the URL https://www.tiendasjumbo.co/buscar?q=mani from an online store. To extract product labels and other fields from this site, a user attempted the following approach:

<code class="python">from selenium import webdriver
import time
from random import randint

driver = webdriver.Firefox(executable_path="C:\Program Files (x86)\geckodriver.exe")
driver.implicitly_wait(10)
time.sleep(4)

url = "https://www.tiendasjumbo.co/buscar?q=mani"
driver.maximize_window()
driver.get(url)
driver.find_element_by_xpath('//h1[@class="impulse-title"]')</code>

However, this approach failed, and switching iframes proved equally unsuccessful.

Solution:

The key to extracting data from this site lies in recognizing that the products are located within a #shadow-root. To access these elements, Selenium provides the shadowRoot.querySelector() method. Using this method, the product label can be extracted using the following Locator Strategy:

<code class="python">driver.get('https://www.tiendasjumbo.co/buscar?q=mani')
item = driver.execute_script("return document.querySelector('impulse-search').shadowRoot.querySelector('div.group-name-brand h1.impulse-title span.formatted-text')")
print(item.text)</code>

Running this script outputs the product label:

<code class="text">La especial mezcla de nueces, maní, almendras y marañones x 450 g</code>

References:

For further insights, refer to the following resources:

  • Unable to locate the Sign In element within #shadow-root (open) using Selenium and Python
  • How to locate the First name field within shadow-root (open) within the website https://www.virustotal.com using Selenium and Python

Note:

Regarding Microsoft Edge and Google Chrome version 96, changes to shadow root return values for Selenium have been introduced. Refer to the links provided in the solution for more information on addressing these changes in different programming languages.

The above is the detailed content of How to Extract Hidden Information from #shadow-roots Using Selenium Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn