헤드리스 브라우저 자동화 라이브러리는 스크린샷을 찍는 데 적용할 수 있는 다양한 구성 옵션을 제공합니다. 이 가이드에서는 Selenium과 Playwright를 통해 Python 스크린샷을 찍는 방법을 설명합니다. 그런 다음 웹 페이지 캡처를 사용자 정의하기 위한 일반적인 브라우저 팁과 요령을 살펴보겠습니다. 시작해 보세요!
이 가이드에서는 Python 스크린샷을 찍는 데 필요한 설치를 포함하여 핵심 Selenium 및 Playwright 메서드를 다루는 것부터 시작하겠습니다. 그런 다음 맞춤형 Selenium 및 Playwright 스크린샷을 찍기 위한 일반적인 기능을 살펴보겠습니다.
Selenium을 사용하여 Python에서 스크린샷을 찍는 방법을 알아보기 전에 먼저 설치해 보겠습니다. webdriver-manager와 함께 Selenium을 설치하려면 아래 pip 명령을 사용하십시오.
pip install selenium webdriver-manager
webdriver-manager Python 라이브러리를 사용하여 필요한 브라우저 드라이버를 자동으로 다운로드합니다.
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
이제 필수 설치가 준비되었으므로 Selenium Python을 사용하여 스크린샷을 찍어 보겠습니다.
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install())) # request target web page driver.get("https://web-scraping.dev/products") # take sceenshot and directly save it driver.save_screenshot('products.png') # image as bytes bytes = driver.get_screenshot_as_png() # image as base64 string base64_string = driver.get_screenshot_as_base64()
Selenium Python 스크린샷을 찍기 위한 위의 Python 스크립트는 매우 간단합니다. save_screenshot 메서드를 사용하여 전체 드라이버 뷰포트의 스크린샷을 찍고 이미지 파일을 products.png 파일에 저장합니다. 디스크에 직접 저장하는 대신 추가 처리를 위해 일반 이미지 데이터를 바이너리 또는 base64로 저장하는 다른 방법도 있습니다.
Selenium에 대한 자세한 내용은 전용 가이드를 참조하세요.
Playwright API는 다양한 프로그래밍 언어로 제공됩니다. Python Playwright를 사용하여 스크린샷을 찍을 것이기 때문입니다. 아래 pip 명령을 사용하여 Python 패키지를 설치합니다.
pip install playwright
다음으로 필요한 Playwright 웹 다이버 바이너리를 설치하세요.
playwright install chromium # alternatively install `firefox` or `webkit`
Playwright 스크린샷을 찍으려면 .screenshot 메소드를 사용할 수 있습니다.
from pathlib import Path from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() page = context.new_page() # request target web page page.goto('https://web-scraping.dev/products') # take sceenshot and directly save it page.screenshot(path="products.png") # or screenshot as bytes image_bytes = page.screenshot() Path("products.png").write_bytes(image_bytes)
위에서는 새로운 Playwright 헤드리스 브라우저 인스턴스를 시작한 다음 그 안에 새 탭을 여는 것부터 시작했습니다. 그런 다음 페이지 스크린샷을 사용하여 제품 PNG 파일에 저장합니다.
웹 스크래핑에 Playwright를 사용하는 방법에 대한 자세한 내용은 Playwright 전용 가이드를 참조하세요.
웹페이지의 이미지는 동적으로 로드됩니다. 따라서 웹사이트 스크린샷 손상을 방지하려면 로드될 때까지 올바르게 기다리는 것이 중요합니다
. 대기 및 시간 초과를 정의하는 다양한 기술을 살펴보겠습니다.고정 시간 초과는 가장 기본적인 유형의 헤드리스 브라우저 대기 기능입니다. 스크린샷을 캡처하기 전에 일정 시간 동안 기다리면서
모든 DOM 요소가 올바르게 로드되었는지 확인합니다.
셀레늄:
import time # .... driver.get("https://web-scraping.dev/products") # wait for 10 seconds before taking screenshot time.sleep(10) driver.save_screenshot('products.png')
극작가
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for 10 seconds before taking screenshot page.wait_for_timeout(10000) page.screenshot(path="products.png")
위에서는 웹페이지 스크린샷을 진행하기 전에 Playwright의 wait_for_timeout 메소드를 사용하여 고정 대기 조건을 정의했습니다. Selenium은 고정된 시간 초과에 대한 내장 메서드를 제공하지 않으므로
Python의 내장 시간 모듈을 사용합니다.동적 대기 조건에는 진행하기 전에 특정 요소의 선택자가 페이지에 표시될 때까지 기다리는
작업이 포함됩니다. 정의된 시간 내에 선택자를 찾으면 대기 프로세스가 종료됩니다.
셀레늄
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions from selenium.webdriver.common.by import By # .... driver.get("https://web-scraping.dev/products") _timeout = 10 # set the maximum timeout to 10 seconds wait = WebDriverWait(driver, _timeout) wait.until(expected_conditions.presence_of_element_located( (By.XPATH, "//div[@class='products']") # wait for XPath selector # (By.CSS_SELECTOR, "div.products") # wait for CSS selector ) ) driver.save_screenshot("products.png")
극작가
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for XPath or CSS selector page.wait_for_selector("div.products", timeout=10000) page.wait_for_selector("//div[@class='products']", timeout=10000) page.screenshot(path="products.png")
위에서는 Selenium의 Expect_conditions 및 Playwright의 wait_for_selector 메서드를 사용하여 Python 스크린샷을 찍기 전에 선택기를 기다리는 동적 조건을 활용했습니다.
사용 가능한 마지막 대기 조건은 로드 상태입니다. 브라우저 페이지가 특정 상태
에 도달할 때까지 기다립니다.네트워크 유휴 상태를 기다리는 것은 렌더링할 여러 이미지가 포함된 웹 페이지 스냅샷을 캡처할 때 특히 유용합니다. 이러한 페이지는 대역폭을 많이 사용하므로 특정 선택기를 기다리는 대신 모든 네트워크 호출이 완료될 때까지 기다리는 것이 더 쉽습니다.
Playwright 스크린샷을 찍기 전에 waitForLoadState 메소드를 활용하여 기다리는 방법은 다음과 같습니다.
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for load state page.wait_for_load_state("domcontentloaded") # DOM tree to load page.wait_for_load_state("networkidle") # network to be idle page.screenshot(path="products.png")
Selenium에는 드라이버의 로드 상태를 가로채는 방법이 없지만 사용자 정의 JavaScript 실행을 사용하여 구현할 수 있습니다.
에뮬레이션을 사용하면 헤드리스 브라우저 구성을 맞춤설정하여 일반적인 웹 브라우저의 유형과 사용자 기본 설정을 시뮬레이션할 수 있습니다. 이러한 설정은 그에 따라 촬영된 웹페이지 스크린샷에 반영
됩니다.For instance, by emulating a specific phone browser, the website screenshot taken appears as if it was captured by an actual phone.
Viewport settings represent the resolution of the browser device through width and height dimensions. Here's how to change Python screenshot viewport.
Selenium
# .... # set the viewport dimensions (width x height) driver.set_window_size(1920, 1080) driver.get("https://web-scraping.dev/products") driver.save_screenshot("products.png")
Playwright:
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( viewport={"width": 1920, "height": 1080}, # set viewport dimensions device_scale_factor=2, # increase the pixel ratio ) page = context.new_page() page.goto("https://web-scraping.dev/products") page.screenshot(path="products.png")
Here, we use Selenium's set_window_size method to set the browser viewport. As for Playwright, we define a browser context to set the viewport in addition to increasing the pixel ratio rate through the the device_scale_factor property for higher quality.
Playwrights provides a wide range of device presets to emulate multiple browsers and operating systems, enabling further Playwright screenshot customization:
# .... with sync_playwright() as p: iphone_14 = p.devices['iPhone 14 Pro Max'] browser = p.webkit.launch(headless=False) context = browser.new_context( **iphone_14, ) # open a browser tab with iPhone 14 Pro Max's device profile page = context.new_page() # ....
The above Python script selects a device profile to automatically define its settings, including UserAgent, screen viewport, scale factor, and browser type. For the full list of available device profiles, refer to the official device registry.
Taking a screenshot on websites with localization features can make the images look different based on the locale language and timezone settings used. Hence, corretly setting these values ensures the correct behavior.
Selenium
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.options import Options driver_manager = ChromeService(ChromeDriverManager().install()) options = Options() # set locale options.add_argument("--lang=fr-FR") driver = webdriver.Chrome(service=driver_manager, options=options) # set timezone using devtools protocol timezone = {'timezoneId': 'Europe/Paris'} driver.execute_cdp_cmd('Emulation.setTimezoneOverride', timezone) driver.get("https://webbrowsertools.com/timezone/") # ....
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( locale='fr-FR', timezone_id='Europe/Paris', ) page = context.new_page() page.goto("https://webbrowsertools.com/timezone/") # ....
In this Python code, we set the browser's localization preferences through the locale and timezone settings. However, other factors can affect the localization profile used. For the full details, refer to our dedicated guide on web scraping localization.
Taking Python screenshots on websites can often be affected by automatic browser location identification. Here's how we can change it through longitude and latitude values.
Selenium
# .... driver_manager = ChromeService(ChromeDriverManager().install()) driver = webdriver.Chrome(service=driver_manager) geolocation = dict( { "latitude": 37.17634, "longitude": -3.58821, "accuracy": 100 } ) # set geolocation using devtools protocol driver.execute_cdp_cmd("Emulation.setGeolocationOverride", geolocation)
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( geolocation={"longitude": 37.17634, "latitude": -3.58821}, permissions=["geolocation"] ) page = context.new_page()
Taking webpage screenshots in the dark mode is quite popular. To approach it, we can change the browser's default color theme preference.
Selenium
# .... options = Options() options.add_argument('--force-dark-mode') driver_manager = ChromeService(ChromeDriverManager().install()) driver = webdriver.Chrome(service=driver_manager, options=options) driver.get("https://reddit.com/") # will open in dark mode
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( color_scheme='dark' ) page = context.new_page() page.goto("https://reddit.com/") # will open in dark mode
The above code sets the default browser theme to dark mode, enabling dark-mode web screenshots accordingly. However, it has no effect on websites without native theme-modification support.
To force dark-mode screenshots across all websites, we can use Chrome flags. To do this, start by retrieving the required argument using the below steps:
After retrieving the flag argument, add it to the browser context to force dark website screenshots in Python.
Selenium
# .... driver_manager = ChromeService(ChromeDriverManager().install()) options = Options() options.add_argument('--enable-features=WebContentsForceDark:inversion_method/cielab_based') driver = webdriver.Chrome(service=driver_manager, options=options) driver.get("https://web-scraping.dev/products") driver.save_screenshot('dark_screenshot.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch( headless=False, args=[ '--enable-features=WebContentsForceDark:inversion_method/cielab_based' ] ) context = browser.new_context() page = context.new_page() page.goto("https://web-scraping.dev/products") page.screenshot(path="dark_screenshot.png")
Here's what the retrieved dark-mode Python screenshot looks like:
Lastly, let's explore using Python to screenshot webpages through area selection. It enables targeting specific areas of the page.
Taking full-page screenshots is an extremely popular use case, allowing snapshots to be captured at the whole page's vertical height.
Full-page screenshots are often _ misunderstood _. Hence, it's important to differentiate between two distinct concepts:
A headless browser can scroll down, but its screenshot height hasn't been updated for the new height , or vice versa. Hence, the retrieved web snapshot doesn't look as expected.
Here's how to take scrolling screenshots with Selenium and Playwright.
Selenium
# .... def scroll(driver): _prev_height = -1 _max_scrolls = 100 _scroll_count = 0 while _scroll_count < _max_scrolls: # execute JavaScript to scroll to the bottom of the page driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # wait for new content to load (change this value as needed) time.sleep(1) # check whether the scroll height changed - means more pages are there new_height = driver.execute_script("return document.body.scrollHeight") if new_height == _prev_height: break _prev_height = new_height _scroll_count += 1 driver_manager = ChromeService(ChromeDriverManager().install()) options = Options() options.add_argument("--headless") # ⚠️ headless mode is required driver = webdriver.Chrome(service=driver_manager, options=options) # request the target page and scroll down driver.get("https://web-scraping.dev/testimonials") scroll(driver) # retrieve the new page height and update the viewport new_height = driver.execute_script("return document.body.scrollHeight") driver.set_window_size(1920, new_height) # screenshot the main page content (body) driver.find_element(By.TAG_NAME, "body").screenshot("full-page-screenshot.png")
Playwright
# .... def scroll(page): _prev_height = -1 _max_scrolls = 100 _scroll_count = 0 while _scroll_count < _max_scrolls: # execute JavaScript to scroll to the bottom of the page page.evaluate("window.scrollTo(0, document.body.scrollHeight)") # wait for new content to load page.wait_for_timeout(1000) # check whether the scroll height changed - means more pages are there new_height = page.evaluate("document.body.scrollHeight") if new_height == _prev_height: break _prev_height = new_height _scroll_count += 1 with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( viewport={"width": 1920, "height": 1080} ) page = context.new_page() # request the target page and scroll down page.goto("https://web-scraping.dev/testimonials") scroll(page) # automatically capture the full page page.screenshot(path="full-page-screenshot.png", full_page=True)
Since Selenium doesn't provide automatic full page screenshot capturing capabilities, we utilize additional steps:
So far, we have been taking web page screenshots against the entire screen viewport. However, headless browsers allow targeting a specific area by screenshotting elements using their equivalent selectors :
Selenium
# .... driver.set_window_size(1920, 1080) driver.get("https://web-scraping.dev/product/3") # wait for the target element to be visible wait = WebDriverWait(driver, 10) wait.until(expected_conditions.presence_of_element_located( (By.CSS_SELECTOR, "div.row.product-data") )) element = driver.find_element(By.CSS_SELECTOR, 'div.row.product-data') # take web page screenshot of the specific element element.screenshot('product-data.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( viewport={"width": 1920, "height": 1080} ) page = context.new_page() page.goto('https://web-scraping.dev/product/3') # wait for the target element to be visible page.wait_for_selector('div.row.product-data') # take web page screenshot of the specific element page.locator('div.row.product-data').screenshot(path="product-data.png")
In the above code, we start by waiting for the desired element to appear in the HTML. Then, we select it and specifically capture it. Here's what's the retrieved Python screenshot looks like:
Furthermore, we can customize the webpage Python screenshots using coordinate values. In other words, it crops the web page into an image using four attributes :
Here's how to take clipped Playwright and Selenium screenshots:
Selenium
from PIL import Image # pip install pillow from io import BytesIO # .... driver.get("https://web-scraping.dev/product/3") wait = WebDriverWait(driver, 10) wait.until(expected_conditions.presence_of_element_located( (By.CSS_SELECTOR, "div.row.product-data") )) element = driver.find_element(By.CSS_SELECTOR, 'div.row.product-data') # automatically retrieve the coordinate values of selected selector location = element.location size = element.size coordinates = { "x": location['x'], "y": location['y'], "width": size['width'], "height": size['height'] } print(coordinates) {'x': 320.5, 'y': 215.59375, 'width': 1262, 'height': 501.828125} # capture full driver screenshot screenshot_bytes = driver.get_screenshot_as_png() # clip the screenshot and save it img = Image.open(BytesIO(screenshot_bytes)) clip_box = (coordinates['x'], coordinates['y'], coordinates['x'] + coordinates['width'], coordinates['y'] + coordinates['height']) cropped_img = img.crop(clip_box) cropped_img.save('clipped-screenshot.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() page = context.new_page() page.goto('https://web-scraping.dev/product/3') page.wait_for_selector('div.row.product-data') element = page.query_selector("div.row.product-data") # automatically retrieve the coordinate values of selected selector coordinates = element.bounding_box() print(coordinates) {'x': 320.5, 'y': 215.59375, 'width': 1262, 'height': 501.828125} # capture the screenshot with clipping page.screenshot(path="clipped-screenshot.png", clip=coordinates)
We use Playwright's built-in clip method to automatically crop the captured screenshot. As for Selenium, we use Pillow to manually clip the full web page snapshot.
Websites' pop-up banners prevent taking clear screenshots. One of these is the famous " Accept Cookies" banner on web-scraping.dev as an example:
The above banner is displayed through cookies. If we click "accept", a cookie value will be saved on the browser to save our reference and prevent displaying the banner again.
If we observe observe browser developer tools
, we'll find the cookiesAccepted cookie set to true. So, to block cookie banners while taking Python screenshots, we'll set this cookie before navigating to the target web page.
Selenium
# .... driver.get("https://web-scraping.dev") # add the cookie responsible for blocking screenshot banners driver.add_cookie({'name': 'cookiesAccepted', 'value': 'true', 'domain': 'web-scraping.dev'}) driver.get("https://web-scraping.dev/login?cookies=") driver.save_screenshot('blocked-banner-screenshot.png'
Playwright
with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() # add the cookie responsible for blocking screenshot banners context.add_cookies( [{'name': 'cookiesAccepted', 'value': 'true', 'domain': 'web-scraping.dev', 'path': '/'}] ) page = context.new_page() page.goto('https://web-scraping.dev/login?cookies=') page.screenshot(path='blocked-banner-screenshot.png')
For further details on using cookies, refer to our dedicated guide.
So far, we have explored taking website screenshots using a basic headless browser configuration. However, modern websites prevent screenshot automation using anti-bot measures. Moreover, maintaining headless web browsers can be complex and time-consuming.
ScrapFly is a screenshot API that enables taking web page captures at scale by providing:
ScrapFly abstracts away all the required engineering efforts!
Here's how to take Python screenshots using ScrapFly's screenshot API. It's as simple as sending an API request:
from pathlib import Path import urllib.parse import requests base_url = 'https://api.scrapfly.io/screenshot?' params = { 'key': 'Your ScrapFly API key', 'url': 'https://web-scraping.dev/products', # web page URL to screenshot 'format': 'png', # screenshot format (file extension) 'capture': 'fullpage', # area to capture (specific element, fullpage, viewport) 'resolution': '1920x1080', # screen resolution 'country': 'us', # proxy country 'rendering_wait': 5000, # time to wait in milliseconds before capturing 'wait_for_selector': 'div.products-wrap', # selector to wait on the web page 'options': [ 'dark_mode', # use the dark mode 'block_banners', # block pop up banners 'print_media_format' # emulate media printing format ], 'auto_scroll': True # automatically scroll down the page } # Convert the list of options to a comma-separated string params['options'] = ','.join(params['options']) query_string = urllib.parse.urlencode(params) full_url = base_url + query_string response = requests.get(full_url) image_bytes = response.content # save to disk Path("screenshot.png").write_bytes(image_bytes)
Try for FREE!
More on Scrapfly
To wrap up this guide on taking website screenshots with Python Selenium and Playwright, let's have a look at some frequqntly asked questions.
Yes, screenshot APIs are great alternatives. They manage headless browsers under the hood, enabling website snapshots through simple HTTP requests. For further details, refer to our guide on the best screenshot API.
Puppeteer is a popular headless browser that allows web page captures using the page.screenshot method. For more, refer to our guide on taking screenshots with Puppeteer.
전체 페이지 스크린샷을 찍으려면 필요한 경우 Selenium이나 Playwright를 사용하여 페이지를 아래로 스크롤하세요. 그런 다음 Playwright의 전체 페이지 방법: 스크린샷(path, full_page=True)을 사용하여 전체 뷰포트에서 스크린샷을 자동으로 캡처합니다.
Selenium의 경우 전체 수직 높이를 포함하도록 스크롤한 후 브라우저의 뷰포트 높이를 수동으로 업데이트하세요.
이 가이드에서는 Python에서 Playwright 및 Selenium 스크린샷을 찍는 방법을 설명했습니다. 설치부터 기본적인 사용법까지 다뤘습니다.
고급 Selenium 및 Playwright 기능을 사용하여 맞춤형 스크린샷을 캡처하는 방법에 대한 단계별 가이드를 살펴보았습니다.
위 내용은 Python에서 스크린샷을 찍는 방법은 무엇입니까?의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!