Home >Backend Development >Python Tutorial >How to Extract Hidden Information from #shadow-roots Using Selenium Python?
Extracting Information from a #shadow-root using Selenium Python
In the realm of web scraping, extracting data from elements concealed within #shadow-roots can pose a significant challenge. This article explores the techniques to overcome this obstacle using Selenium Python.
Problem:
Consider the URL https://www.tiendasjumbo.co/buscar?q=mani from an online store. To extract product labels and other fields from this site, a user attempted the following approach:
<code class="python">from selenium import webdriver import time from random import randint driver = webdriver.Firefox(executable_path="C:\Program Files (x86)\geckodriver.exe") driver.implicitly_wait(10) time.sleep(4) url = "https://www.tiendasjumbo.co/buscar?q=mani" driver.maximize_window() driver.get(url) driver.find_element_by_xpath('//h1[@class="impulse-title"]')</code>
However, this approach failed, and switching iframes proved equally unsuccessful.
Solution:
The key to extracting data from this site lies in recognizing that the products are located within a #shadow-root. To access these elements, Selenium provides the shadowRoot.querySelector() method. Using this method, the product label can be extracted using the following Locator Strategy:
<code class="python">driver.get('https://www.tiendasjumbo.co/buscar?q=mani') item = driver.execute_script("return document.querySelector('impulse-search').shadowRoot.querySelector('div.group-name-brand h1.impulse-title span.formatted-text')") print(item.text)</code>
Running this script outputs the product label:
<code class="text">La especial mezcla de nueces, maní, almendras y marañones x 450 g</code>
References:
For further insights, refer to the following resources:
Note:
Regarding Microsoft Edge and Google Chrome version 96, changes to shadow root return values for Selenium have been introduced. Refer to the links provided in the solution for more information on addressing these changes in different programming languages.
The above is the detailed content of How to Extract Hidden Information from #shadow-roots Using Selenium Python?. For more information, please follow other related articles on the PHP Chinese website!