Maison >développement back-end >Tutoriel Python >Comment prendre des captures d'écran en Python ?
Les bibliothèques d'automatisation de navigateur sans tête offrent un large éventail d'options de configuration, qui peuvent être appliquées pour prendre des captures d'écran. Dans ce guide, nous expliquerons la prise de captures d'écran Python via Selenium et Playwright. Nous explorerons ensuite les trucs et astuces courants des navigateurs pour personnaliser les captures de pages Web. Commençons !
Dans ce guide, nous commencerons par couvrir les méthodes principales de Selenium et Playwright, y compris l'installation requise pour prendre des captures d'écran Python. Ensuite, nous explorerons les fonctionnalités communes pour prendre des captures d'écran personnalisées de Selenium et Playwright.
Avant d'explorer l'utilisation de Selenium pour effectuer une capture d'écran en Python, installons-le. Utilisez la commande pip ci-dessous pour installer Selenium avec webdriver-manager :
pip install selenium webdriver-manager
Nous utiliserons la bibliothèque Python webdriver-manager pour télécharger automatiquement les pilotes de navigateur requis :
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
Maintenant que l'installation requise est prête, utilisons Selenium Python pour prendre des captures d'écran :
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install())) # request target web page driver.get("https://web-scraping.dev/products") # take sceenshot and directly save it driver.save_screenshot('products.png') # image as bytes bytes = driver.get_screenshot_as_png() # image as base64 string base64_string = driver.get_screenshot_as_base64()
Le script Python ci-dessus pour prendre des captures d'écran Selenium Python est assez simple. Nous utilisons la méthode save_screenshot pour prendre une capture d'écran de la fenêtre d'affichage complète du pilote et enregistrer le fichier image dans le fichier products.png. Une alternative à l'enregistrement direct sur le disque, d'autres méthodes sont disponibles pour enregistrer les données d'image brutes au format binaire ou base64 pour un traitement ultérieur.
Pour plus de détails sur Selenium, référez-vous à notre guide dédié.
L'API Playwright est disponible dans différents langages de programmation. Puisque nous utiliserons prendre des captures d’écran avec Python Playwright. Installez son package Python à l'aide de la commande pip ci-dessous :
pip install playwright
Ensuite, installez les binaires Playwright Web Diver requis :
playwright install chromium # alternatively install `firefox` or `webkit`
Pour prendre une capture d'écran de Playwright, nous pouvons utiliser la méthode .screenshot :
from pathlib import Path from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() page = context.new_page() # request target web page page.goto('https://web-scraping.dev/products') # take sceenshot and directly save it page.screenshot(path="products.png") # or screenshot as bytes image_bytes = page.screenshot() Path("products.png").write_bytes(image_bytes)
Ci-dessus, nous commençons par lancer une nouvelle instance de navigateur sans tête Playwright, puis ouvrons un nouvel onglet à l'intérieur. Ensuite, nous utilisons une capture d'écran de la page et l'enregistrons dans le fichier PNG du produit.
Jetez un œil à notre guide dédié sur Playwright pour plus de détails sur son utilisation pour le web scraping.
Les images sur les pages Web sont chargées dynamiquement. Par conséquent, attendre correctement leur chargement est crucial pour éviter les captures d'écran de sites Web corrompues. Explorons différentes techniques pour définir les attentes et les délais d'attente.
Les délais d'attente fixes sont le type le plus basique de fonctionnalités d'attente de navigateur sans tête. En attendant un certain temps avant de capturer des captures d'écran, nous nous assurons que tous les éléments du DOM sont correctement chargés.
Sélénium :
import time # .... driver.get("https://web-scraping.dev/products") # wait for 10 seconds before taking screenshot time.sleep(10) driver.save_screenshot('products.png')
Dramaturge
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for 10 seconds before taking screenshot page.wait_for_timeout(10000) page.screenshot(path="products.png")
Ci-dessus, nous utilisons la méthode wait_for_timeout de Playwright pour définir une condition d'attente fixe avant de procéder à la capture d'écran de la page Web. Puisque Selenium ne fournit pas de méthodes intégrées pour les délais d'attente fixes, nous utilisons le module de temps intégré de Python.
Les conditions d'attente dynamiques impliquent d'attendre que les sélecteurs d'éléments spécifiques deviennent visibles sur la page avant de continuer. Si le sélecteur est trouvé dans le délai défini, le processus d'attente est terminé.
Sélénium
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions from selenium.webdriver.common.by import By # .... driver.get("https://web-scraping.dev/products") _timeout = 10 # set the maximum timeout to 10 seconds wait = WebDriverWait(driver, _timeout) wait.until(expected_conditions.presence_of_element_located( (By.XPATH, "//div[@class='products']") # wait for XPath selector # (By.CSS_SELECTOR, "div.products") # wait for CSS selector ) ) driver.save_screenshot("products.png")
Dramaturge
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for XPath or CSS selector page.wait_for_selector("div.products", timeout=10000) page.wait_for_selector("//div[@class='products']", timeout=10000) page.screenshot(path="products.png")
Ci-dessus, nous utilisons des conditions dynamiques pour attendre les sélecteurs avant de prendre des captures d'écran Python en utilisant les conditions attendues de Selenium et la méthode wait_for_selector de Playwright.
La dernière condition d'attente disponible est l'état de chargement. Il attend que la page du navigateur atteigne un état spécifique :
L'attente de l'état d'inactivité du réseau est particulièrement utile lors de la capture d'instantanés de pages Web avec plusieurs images à restituer. Étant donné que ces pages consomment beaucoup de bande passante, il est plus facile d'attendre la fin de tous les appels réseau au lieu d'attendre des sélecteurs spécifiques.
Voici comment utiliser la méthode waitForLoadState pour attendre avant de prendre des captures d'écran de Playwright :
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for load state page.wait_for_load_state("domcontentloaded") # DOM tree to load page.wait_for_load_state("networkidle") # network to be idle page.screenshot(path="products.png")
Notez que Selenium ne dispose pas de méthodes disponibles pour intercepter l'état de chargement du pilote, mais il peut être implémenté à l'aide d'une exécution JavaScript personnalisée.
L'émulation permet de personnaliser la configuration du navigateur sans tête pour simuler les types de navigateurs Web courants et les préférences des utilisateurs. Ces paramètres sont reflétés sur les captures d'écran de la page Web prises en conséquence.
For instance, by emulating a specific phone browser, the website screenshot taken appears as if it was captured by an actual phone.
Viewport settings represent the resolution of the browser device through width and height dimensions. Here's how to change Python screenshot viewport.
Selenium
# .... # set the viewport dimensions (width x height) driver.set_window_size(1920, 1080) driver.get("https://web-scraping.dev/products") driver.save_screenshot("products.png")
Playwright:
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( viewport={"width": 1920, "height": 1080}, # set viewport dimensions device_scale_factor=2, # increase the pixel ratio ) page = context.new_page() page.goto("https://web-scraping.dev/products") page.screenshot(path="products.png")
Here, we use Selenium's set_window_size method to set the browser viewport. As for Playwright, we define a browser context to set the viewport in addition to increasing the pixel ratio rate through the the device_scale_factor property for higher quality.
Playwrights provides a wide range of device presets to emulate multiple browsers and operating systems, enabling further Playwright screenshot customization:
# .... with sync_playwright() as p: iphone_14 = p.devices['iPhone 14 Pro Max'] browser = p.webkit.launch(headless=False) context = browser.new_context( **iphone_14, ) # open a browser tab with iPhone 14 Pro Max's device profile page = context.new_page() # ....
The above Python script selects a device profile to automatically define its settings, including UserAgent, screen viewport, scale factor, and browser type. For the full list of available device profiles, refer to the official device registry.
Taking a screenshot on websites with localization features can make the images look different based on the locale language and timezone settings used. Hence, corretly setting these values ensures the correct behavior.
Selenium
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.options import Options driver_manager = ChromeService(ChromeDriverManager().install()) options = Options() # set locale options.add_argument("--lang=fr-FR") driver = webdriver.Chrome(service=driver_manager, options=options) # set timezone using devtools protocol timezone = {'timezoneId': 'Europe/Paris'} driver.execute_cdp_cmd('Emulation.setTimezoneOverride', timezone) driver.get("https://webbrowsertools.com/timezone/") # ....
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( locale='fr-FR', timezone_id='Europe/Paris', ) page = context.new_page() page.goto("https://webbrowsertools.com/timezone/") # ....
In this Python code, we set the browser's localization preferences through the locale and timezone settings. However, other factors can affect the localization profile used. For the full details, refer to our dedicated guide on web scraping localization.
Taking Python screenshots on websites can often be affected by automatic browser location identification. Here's how we can change it through longitude and latitude values.
Selenium
# .... driver_manager = ChromeService(ChromeDriverManager().install()) driver = webdriver.Chrome(service=driver_manager) geolocation = dict( { "latitude": 37.17634, "longitude": -3.58821, "accuracy": 100 } ) # set geolocation using devtools protocol driver.execute_cdp_cmd("Emulation.setGeolocationOverride", geolocation)
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( geolocation={"longitude": 37.17634, "latitude": -3.58821}, permissions=["geolocation"] ) page = context.new_page()
Taking webpage screenshots in the dark mode is quite popular. To approach it, we can change the browser's default color theme preference.
Selenium
# .... options = Options() options.add_argument('--force-dark-mode') driver_manager = ChromeService(ChromeDriverManager().install()) driver = webdriver.Chrome(service=driver_manager, options=options) driver.get("https://reddit.com/") # will open in dark mode
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( color_scheme='dark' ) page = context.new_page() page.goto("https://reddit.com/") # will open in dark mode
The above code sets the default browser theme to dark mode, enabling dark-mode web screenshots accordingly. However, it has no effect on websites without native theme-modification support.
To force dark-mode screenshots across all websites, we can use Chrome flags. To do this, start by retrieving the required argument using the below steps:
After retrieving the flag argument, add it to the browser context to force dark website screenshots in Python.
Selenium
# .... driver_manager = ChromeService(ChromeDriverManager().install()) options = Options() options.add_argument('--enable-features=WebContentsForceDark:inversion_method/cielab_based') driver = webdriver.Chrome(service=driver_manager, options=options) driver.get("https://web-scraping.dev/products") driver.save_screenshot('dark_screenshot.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch( headless=False, args=[ '--enable-features=WebContentsForceDark:inversion_method/cielab_based' ] ) context = browser.new_context() page = context.new_page() page.goto("https://web-scraping.dev/products") page.screenshot(path="dark_screenshot.png")
Here's what the retrieved dark-mode Python screenshot looks like:
Lastly, let's explore using Python to screenshot webpages through area selection. It enables targeting specific areas of the page.
Taking full-page screenshots is an extremely popular use case, allowing snapshots to be captured at the whole page's vertical height.
Full-page screenshots are often _ misunderstood _. Hence, it's important to differentiate between two distinct concepts:
A headless browser can scroll down, but its screenshot height hasn't been updated for the new height , or vice versa. Hence, the retrieved web snapshot doesn't look as expected.
Here's how to take scrolling screenshots with Selenium and Playwright.
Selenium
# .... def scroll(driver): _prev_height = -1 _max_scrolls = 100 _scroll_count = 0 while _scroll_count < _max_scrolls: # execute JavaScript to scroll to the bottom of the page driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # wait for new content to load (change this value as needed) time.sleep(1) # check whether the scroll height changed - means more pages are there new_height = driver.execute_script("return document.body.scrollHeight") if new_height == _prev_height: break _prev_height = new_height _scroll_count += 1 driver_manager = ChromeService(ChromeDriverManager().install()) options = Options() options.add_argument("--headless") # ⚠️ headless mode is required driver = webdriver.Chrome(service=driver_manager, options=options) # request the target page and scroll down driver.get("https://web-scraping.dev/testimonials") scroll(driver) # retrieve the new page height and update the viewport new_height = driver.execute_script("return document.body.scrollHeight") driver.set_window_size(1920, new_height) # screenshot the main page content (body) driver.find_element(By.TAG_NAME, "body").screenshot("full-page-screenshot.png")
Playwright
# .... def scroll(page): _prev_height = -1 _max_scrolls = 100 _scroll_count = 0 while _scroll_count < _max_scrolls: # execute JavaScript to scroll to the bottom of the page page.evaluate("window.scrollTo(0, document.body.scrollHeight)") # wait for new content to load page.wait_for_timeout(1000) # check whether the scroll height changed - means more pages are there new_height = page.evaluate("document.body.scrollHeight") if new_height == _prev_height: break _prev_height = new_height _scroll_count += 1 with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( viewport={"width": 1920, "height": 1080} ) page = context.new_page() # request the target page and scroll down page.goto("https://web-scraping.dev/testimonials") scroll(page) # automatically capture the full page page.screenshot(path="full-page-screenshot.png", full_page=True)
Since Selenium doesn't provide automatic full page screenshot capturing capabilities, we utilize additional steps:
So far, we have been taking web page screenshots against the entire screen viewport. However, headless browsers allow targeting a specific area by screenshotting elements using their equivalent selectors :
Selenium
# .... driver.set_window_size(1920, 1080) driver.get("https://web-scraping.dev/product/3") # wait for the target element to be visible wait = WebDriverWait(driver, 10) wait.until(expected_conditions.presence_of_element_located( (By.CSS_SELECTOR, "div.row.product-data") )) element = driver.find_element(By.CSS_SELECTOR, 'div.row.product-data') # take web page screenshot of the specific element element.screenshot('product-data.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( viewport={"width": 1920, "height": 1080} ) page = context.new_page() page.goto('https://web-scraping.dev/product/3') # wait for the target element to be visible page.wait_for_selector('div.row.product-data') # take web page screenshot of the specific element page.locator('div.row.product-data').screenshot(path="product-data.png")
In the above code, we start by waiting for the desired element to appear in the HTML. Then, we select it and specifically capture it. Here's what's the retrieved Python screenshot looks like:
Furthermore, we can customize the webpage Python screenshots using coordinate values. In other words, it crops the web page into an image using four attributes :
Here's how to take clipped Playwright and Selenium screenshots:
Selenium
from PIL import Image # pip install pillow from io import BytesIO # .... driver.get("https://web-scraping.dev/product/3") wait = WebDriverWait(driver, 10) wait.until(expected_conditions.presence_of_element_located( (By.CSS_SELECTOR, "div.row.product-data") )) element = driver.find_element(By.CSS_SELECTOR, 'div.row.product-data') # automatically retrieve the coordinate values of selected selector location = element.location size = element.size coordinates = { "x": location['x'], "y": location['y'], "width": size['width'], "height": size['height'] } print(coordinates) {'x': 320.5, 'y': 215.59375, 'width': 1262, 'height': 501.828125} # capture full driver screenshot screenshot_bytes = driver.get_screenshot_as_png() # clip the screenshot and save it img = Image.open(BytesIO(screenshot_bytes)) clip_box = (coordinates['x'], coordinates['y'], coordinates['x'] + coordinates['width'], coordinates['y'] + coordinates['height']) cropped_img = img.crop(clip_box) cropped_img.save('clipped-screenshot.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() page = context.new_page() page.goto('https://web-scraping.dev/product/3') page.wait_for_selector('div.row.product-data') element = page.query_selector("div.row.product-data") # automatically retrieve the coordinate values of selected selector coordinates = element.bounding_box() print(coordinates) {'x': 320.5, 'y': 215.59375, 'width': 1262, 'height': 501.828125} # capture the screenshot with clipping page.screenshot(path="clipped-screenshot.png", clip=coordinates)
We use Playwright's built-in clip method to automatically crop the captured screenshot. As for Selenium, we use Pillow to manually clip the full web page snapshot.
Websites' pop-up banners prevent taking clear screenshots. One of these is the famous " Accept Cookies" banner on web-scraping.dev as an example:
The above banner is displayed through cookies. If we click "accept", a cookie value will be saved on the browser to save our reference and prevent displaying the banner again.
If we observe observe browser developer tools
, we'll find the cookiesAccepted cookie set to true. So, to block cookie banners while taking Python screenshots, we'll set this cookie before navigating to the target web page.
Selenium
# .... driver.get("https://web-scraping.dev") # add the cookie responsible for blocking screenshot banners driver.add_cookie({'name': 'cookiesAccepted', 'value': 'true', 'domain': 'web-scraping.dev'}) driver.get("https://web-scraping.dev/login?cookies=") driver.save_screenshot('blocked-banner-screenshot.png'
Playwright
with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() # add the cookie responsible for blocking screenshot banners context.add_cookies( [{'name': 'cookiesAccepted', 'value': 'true', 'domain': 'web-scraping.dev', 'path': '/'}] ) page = context.new_page() page.goto('https://web-scraping.dev/login?cookies=') page.screenshot(path='blocked-banner-screenshot.png')
For further details on using cookies, refer to our dedicated guide.
So far, we have explored taking website screenshots using a basic headless browser configuration. However, modern websites prevent screenshot automation using anti-bot measures. Moreover, maintaining headless web browsers can be complex and time-consuming.
ScrapFly is a screenshot API that enables taking web page captures at scale by providing:
ScrapFly abstracts away all the required engineering efforts!
Here's how to take Python screenshots using ScrapFly's screenshot API. It's as simple as sending an API request:
from pathlib import Path import urllib.parse import requests base_url = 'https://api.scrapfly.io/screenshot?' params = { 'key': 'Your ScrapFly API key', 'url': 'https://web-scraping.dev/products', # web page URL to screenshot 'format': 'png', # screenshot format (file extension) 'capture': 'fullpage', # area to capture (specific element, fullpage, viewport) 'resolution': '1920x1080', # screen resolution 'country': 'us', # proxy country 'rendering_wait': 5000, # time to wait in milliseconds before capturing 'wait_for_selector': 'div.products-wrap', # selector to wait on the web page 'options': [ 'dark_mode', # use the dark mode 'block_banners', # block pop up banners 'print_media_format' # emulate media printing format ], 'auto_scroll': True # automatically scroll down the page } # Convert the list of options to a comma-separated string params['options'] = ','.join(params['options']) query_string = urllib.parse.urlencode(params) full_url = base_url + query_string response = requests.get(full_url) image_bytes = response.content # save to disk Path("screenshot.png").write_bytes(image_bytes)
Try for FREE!
More on Scrapfly
To wrap up this guide on taking website screenshots with Python Selenium and Playwright, let's have a look at some frequqntly asked questions.
Yes, screenshot APIs are great alternatives. They manage headless browsers under the hood, enabling website snapshots through simple HTTP requests. For further details, refer to our guide on the best screenshot API.
Puppeteer is a popular headless browser that allows web page captures using the page.screenshot method. For more, refer to our guide on taking screenshots with Puppeteer.
Pour prendre une capture d'écran complète de la page, faites défiler la page en utilisant Selenium ou Playwright si nécessaire. Ensuite, utilisez la méthode fullpage dans Playwright : screenshot(path, full_page=True) pour capturer automatiquement des captures d'écran dans la fenêtre d'affichage complète.
Quant à Selenium, mettez à jour manuellement la hauteur de la fenêtre d'affichage du navigateur après le défilement pour couvrir toute la hauteur verticale.
Dans ce guide, nous avons expliqué comment prendre des captures d'écran Playwright et Selenium en Python. Nous avons commencé par couvrir l'installation et l'utilisation de base.
Nous avons parcouru un guide étape par étape sur l'utilisation des fonctionnalités avancées de Selenium et Playwright pour capturer des captures d'écran personnalisées :
Ce qui précède est le contenu détaillé de. pour plus d'informations, suivez d'autres articles connexes sur le site Web de PHP en chinois!