無頭瀏覽器自動化庫提供了廣泛的配置選項,可用於截取螢幕截圖。在本指南中,我們將解釋如何透過 Selenium 和 Playwright 取得 Python 螢幕截圖。然後,我們將探討自訂網頁擷取的常見瀏覽器提示和技巧。讓我們開始吧!
基本截圖功能
在本指南中,我們將首先介紹核心 Selenium 和 Playwright 方法,包括拍攝 Python 螢幕截圖所需的安裝。然後,我們將探索常見功能來獲取客製化的 Selenium 和 Playwright 螢幕截圖。
硒截圖
在探索如何在 Python 中使用 Selenium 進行螢幕截圖之前,讓我們先安裝它。使用以下 pip 指令與 webdriver-manager 一起安裝 Selenium:
pip install selenium webdriver-manager
我們將使用 webdriver-manager Python 程式庫自動下載所需的瀏覽器驅動程式:
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
現在所需的安裝已準備就緒,讓我們使用selenium python 進行截圖:
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install())) # request target web page driver.get("https://web-scraping.dev/products") # take sceenshot and directly save it driver.save_screenshot('products.png') # image as bytes bytes = driver.get_screenshot_as_png() # image as base64 string base64_string = driver.get_screenshot_as_base64()
上面用於獲取 Selenium Python 螢幕截圖的 Python 腳本相當簡單。我們使用 save_screenshot 方法截取完整驅動程式視窗的螢幕截圖並將映像檔儲存到 products.png 檔案中。除了直接儲存到磁碟之外,還可以使用其他方法將純影像資料儲存為二進位或base64以進行進一步處理。
有關 Selenium 的更多詳細信息,請參閱我們的專用指南。
劇作家截圖
Playwright API 可用於不同的程式語言。因為我們將使用 Python Playwright 進行螢幕截圖。使用以下 pip 指令安裝其 Python 套件:
pip install playwright
接下來,安裝所需的 Playwright Web diver 二進位檔案:
playwright install chromium # alternatively install `firefox` or `webkit`
要截取 Playwright 螢幕截圖,我們可以用 .screenshot 方法:
from pathlib import Path from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() page = context.new_page() # request target web page page.goto('https://web-scraping.dev/products') # take sceenshot and directly save it page.screenshot(path="products.png") # or screenshot as bytes image_bytes = page.screenshot() Path("products.png").write_bytes(image_bytes)
在上面,我們先啟動一個新的 Playwright 無頭瀏覽器實例,然後在其中開啟一個新分頁。然後,我們使用螢幕截圖並將其保存到產品PNG檔案中。
請參閱我們的 Playwright 專用指南,以了解有關使用它進行網頁抓取的更多詳細資訊。
等待和超時
網頁上的圖片是動態載入的。因此,正確等待它們加載對於防止網站螢幕截圖損壞至關重要。讓我們探索定義等待和超時的不同技術。
固定超時
固定逾時是無頭瀏覽器等待功能最基本的類型。透過在擷取螢幕截圖之前等待一定時間,我們確保所有 DOM 元素都已正確載入。
硒:
import time # .... driver.get("https://web-scraping.dev/products") # wait for 10 seconds before taking screenshot time.sleep(10) driver.save_screenshot('products.png')
劇作家
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for 10 seconds before taking screenshot page.wait_for_timeout(10000) page.screenshot(path="products.png")
上面,我們使用 Playwright 的 wait_for_timeout 方法來定義一個固定的等待條件,然後再進行網頁截圖。由於 Selenium 沒有提供固定逾時的內建方法,因此我們使用 Python 的內建時間模組。
選擇器
動態等待條件涉及等待特定元素的選擇器在頁面上變得可見,然後再繼續。如果在定義的逾時時間內找到選擇器,則等待程序終止。
硒
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions from selenium.webdriver.common.by import By # .... driver.get("https://web-scraping.dev/products") _timeout = 10 # set the maximum timeout to 10 seconds wait = WebDriverWait(driver, _timeout) wait.until(expected_conditions.presence_of_element_located( (By.XPATH, "//div[@class='products']") # wait for XPath selector # (By.CSS_SELECTOR, "div.products") # wait for CSS selector ) ) driver.save_screenshot("products.png")
劇作家
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for XPath or CSS selector page.wait_for_selector("div.products", timeout=10000) page.wait_for_selector("//div[@class='products']", timeout=10000) page.screenshot(path="products.png")
上面,我們利用 Selenium 的 Expected_conditions 和 Playwright 的 wait_for_selector 方法利用動態條件等待選擇器,然後再進行 Python 螢幕截圖。
負載狀態
最後一個可用的等待條件是載入狀態。它等待瀏覽器頁面達到特定狀態 :
- domcontentloaded:等待完整的 DOM 樹載入
- networkidle: 等待至少 500 毫秒沒有網路連線
在擷取具有多個要渲染的影像的網頁快照時,等待網路空閒狀態特別有用。由於這些頁面佔用大量頻寬,因此等待所有網路呼叫完成比等待特定選擇器更容易。
以下是如何利用 waitForLoadState 方法在取得 Playwright 螢幕截圖之前等待:
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for load state page.wait_for_load_state("domcontentloaded") # DOM tree to load page.wait_for_load_state("networkidle") # network to be idle page.screenshot(path="products.png")
請注意,Selenium 沒有可用的方法來攔截驅動程式的載入狀態,但可以使用自訂 JavaScript 執行來實作。
模擬
模擬可以自訂無頭瀏覽器配置,以模擬常見網頁瀏覽器的類型和使用者首選項。這些設定反映對應的網頁螢幕截圖。
For instance, by emulating a specific phone browser, the website screenshot taken appears as if it was captured by an actual phone.
Viewport
Viewport settings represent the resolution of the browser device through width and height dimensions. Here's how to change Python screenshot viewport.
Selenium
# .... # set the viewport dimensions (width x height) driver.set_window_size(1920, 1080) driver.get("https://web-scraping.dev/products") driver.save_screenshot("products.png")
Playwright:
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( viewport={"width": 1920, "height": 1080}, # set viewport dimensions device_scale_factor=2, # increase the pixel ratio ) page = context.new_page() page.goto("https://web-scraping.dev/products") page.screenshot(path="products.png")
Here, we use Selenium's set_window_size method to set the browser viewport. As for Playwright, we define a browser context to set the viewport in addition to increasing the pixel ratio rate through the the device_scale_factor property for higher quality.
Playwrights provides a wide range of device presets to emulate multiple browsers and operating systems, enabling further Playwright screenshot customization:
# .... with sync_playwright() as p: iphone_14 = p.devices['iPhone 14 Pro Max'] browser = p.webkit.launch(headless=False) context = browser.new_context( **iphone_14, ) # open a browser tab with iPhone 14 Pro Max's device profile page = context.new_page() # ....
The above Python script selects a device profile to automatically define its settings, including UserAgent, screen viewport, scale factor, and browser type. For the full list of available device profiles, refer to the official device registry.
Locale and Timezone
Taking a screenshot on websites with localization features can make the images look different based on the locale language and timezone settings used. Hence, corretly setting these values ensures the correct behavior.
Selenium
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.options import Options driver_manager = ChromeService(ChromeDriverManager().install()) options = Options() # set locale options.add_argument("--lang=fr-FR") driver = webdriver.Chrome(service=driver_manager, options=options) # set timezone using devtools protocol timezone = {'timezoneId': 'Europe/Paris'} driver.execute_cdp_cmd('Emulation.setTimezoneOverride', timezone) driver.get("https://webbrowsertools.com/timezone/") # ....
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( locale='fr-FR', timezone_id='Europe/Paris', ) page = context.new_page() page.goto("https://webbrowsertools.com/timezone/") # ....
In this Python code, we set the browser's localization preferences through the locale and timezone settings. However, other factors can affect the localization profile used. For the full details, refer to our dedicated guide on web scraping localization.
Geolocation
Taking Python screenshots on websites can often be affected by automatic browser location identification. Here's how we can change it through longitude and latitude values.
Selenium
# .... driver_manager = ChromeService(ChromeDriverManager().install()) driver = webdriver.Chrome(service=driver_manager) geolocation = dict( { "latitude": 37.17634, "longitude": -3.58821, "accuracy": 100 } ) # set geolocation using devtools protocol driver.execute_cdp_cmd("Emulation.setGeolocationOverride", geolocation)
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( geolocation={"longitude": 37.17634, "latitude": -3.58821}, permissions=["geolocation"] ) page = context.new_page()
Dark Mode
Taking webpage screenshots in the dark mode is quite popular. To approach it, we can change the browser's default color theme preference.
Selenium
# .... options = Options() options.add_argument('--force-dark-mode') driver_manager = ChromeService(ChromeDriverManager().install()) driver = webdriver.Chrome(service=driver_manager, options=options) driver.get("https://reddit.com/") # will open in dark mode
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( color_scheme='dark' ) page = context.new_page() page.goto("https://reddit.com/") # will open in dark mode
The above code sets the default browser theme to dark mode, enabling dark-mode web screenshots accordingly. However, it has no effect on websites without native theme-modification support.
Pro Tip: Force Dark Mode
To force dark-mode screenshots across all websites, we can use Chrome flags. To do this, start by retrieving the required argument using the below steps:
- Open the available chrome flags from the address chrome://flags/
- Search for the enable-force-dark flag and enable it with selective inversion of everything
- Relaunch the browser
- Go to chrome://version/ and copy the created flag argument from the command line property
After retrieving the flag argument, add it to the browser context to force dark website screenshots in Python.
Selenium
# .... driver_manager = ChromeService(ChromeDriverManager().install()) options = Options() options.add_argument('--enable-features=WebContentsForceDark:inversion_method/cielab_based') driver = webdriver.Chrome(service=driver_manager, options=options) driver.get("https://web-scraping.dev/products") driver.save_screenshot('dark_screenshot.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch( headless=False, args=[ '--enable-features=WebContentsForceDark:inversion_method/cielab_based' ] ) context = browser.new_context() page = context.new_page() page.goto("https://web-scraping.dev/products") page.screenshot(path="dark_screenshot.png")
Here's what the retrieved dark-mode Python screenshot looks like:
Selection Targeting
Lastly, let's explore using Python to screenshot webpages through area selection. It enables targeting specific areas of the page.
Full Page
Taking full-page screenshots is an extremely popular use case, allowing snapshots to be captured at the whole page's vertical height.
Full-page screenshots are often _ misunderstood _. Hence, it's important to differentiate between two distinct concepts:
- Screenshot viewport, the image dimensions as height and width.
- Browser scrolling, whether the driver has scrolled down to load more pages.
A headless browser can scroll down, but its screenshot height hasn't been updated for the new height , or vice versa. Hence, the retrieved web snapshot doesn't look as expected.
Here's how to take scrolling screenshots with Selenium and Playwright.
Selenium
# .... def scroll(driver): _prev_height = -1 _max_scrolls = 100 _scroll_count = 0 while _scroll_count <p>Playwright<br> </p> <pre class="brush:php;toolbar:false"># .... def scroll(page): _prev_height = -1 _max_scrolls = 100 _scroll_count = 0 while _scroll_count <p>Since Selenium doesn't provide automatic full page screenshot capturing capabilities, we utilize additional steps:</p>
- Get the new page height after scrolling and use it to update the viewport.
- Find the body element of the page and target it with a screenshot.
Selectors
So far, we have been taking web page screenshots against the entire screen viewport. However, headless browsers allow targeting a specific area by screenshotting elements using their equivalent selectors :
Selenium
# .... driver.set_window_size(1920, 1080) driver.get("https://web-scraping.dev/product/3") # wait for the target element to be visible wait = WebDriverWait(driver, 10) wait.until(expected_conditions.presence_of_element_located( (By.CSS_SELECTOR, "div.row.product-data") )) element = driver.find_element(By.CSS_SELECTOR, 'div.row.product-data') # take web page screenshot of the specific element element.screenshot('product-data.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( viewport={"width": 1920, "height": 1080} ) page = context.new_page() page.goto('https://web-scraping.dev/product/3') # wait for the target element to be visible page.wait_for_selector('div.row.product-data') # take web page screenshot of the specific element page.locator('div.row.product-data').screenshot(path="product-data.png")
In the above code, we start by waiting for the desired element to appear in the HTML. Then, we select it and specifically capture it. Here's what's the retrieved Python screenshot looks like:
Coordinates
Furthermore, we can customize the webpage Python screenshots using coordinate values. In other words, it crops the web page into an image using four attributes :
- X-coordinate of the clip area's horizontal position (left to right)
- Y-coordinate of the clip area's vertical position (top to bottom)
- Width and height dimensions
Here's how to take clipped Playwright and Selenium screenshots:
Selenium
from PIL import Image # pip install pillow from io import BytesIO # .... driver.get("https://web-scraping.dev/product/3") wait = WebDriverWait(driver, 10) wait.until(expected_conditions.presence_of_element_located( (By.CSS_SELECTOR, "div.row.product-data") )) element = driver.find_element(By.CSS_SELECTOR, 'div.row.product-data') # automatically retrieve the coordinate values of selected selector location = element.location size = element.size coordinates = { "x": location['x'], "y": location['y'], "width": size['width'], "height": size['height'] } print(coordinates) {'x': 320.5, 'y': 215.59375, 'width': 1262, 'height': 501.828125} # capture full driver screenshot screenshot_bytes = driver.get_screenshot_as_png() # clip the screenshot and save it img = Image.open(BytesIO(screenshot_bytes)) clip_box = (coordinates['x'], coordinates['y'], coordinates['x'] + coordinates['width'], coordinates['y'] + coordinates['height']) cropped_img = img.crop(clip_box) cropped_img.save('clipped-screenshot.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() page = context.new_page() page.goto('https://web-scraping.dev/product/3') page.wait_for_selector('div.row.product-data') element = page.query_selector("div.row.product-data") # automatically retrieve the coordinate values of selected selector coordinates = element.bounding_box() print(coordinates) {'x': 320.5, 'y': 215.59375, 'width': 1262, 'height': 501.828125} # capture the screenshot with clipping page.screenshot(path="clipped-screenshot.png", clip=coordinates)
We use Playwright's built-in clip method to automatically crop the captured screenshot. As for Selenium, we use Pillow to manually clip the full web page snapshot.
Banner Blocking
Websites' pop-up banners prevent taking clear screenshots. One of these is the famous " Accept Cookies" banner on web-scraping.dev as an example:
The above banner is displayed through cookies. If we click "accept", a cookie value will be saved on the browser to save our reference and prevent displaying the banner again.
If we observe observe browser developer tools
, we'll find the cookiesAccepted cookie set to true. So, to block cookie banners while taking Python screenshots, we'll set this cookie before navigating to the target web page.
Selenium
# .... driver.get("https://web-scraping.dev") # add the cookie responsible for blocking screenshot banners driver.add_cookie({'name': 'cookiesAccepted', 'value': 'true', 'domain': 'web-scraping.dev'}) driver.get("https://web-scraping.dev/login?cookies=") driver.save_screenshot('blocked-banner-screenshot.png'
Playwright
with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() # add the cookie responsible for blocking screenshot banners context.add_cookies( [{'name': 'cookiesAccepted', 'value': 'true', 'domain': 'web-scraping.dev', 'path': '/'}] ) page = context.new_page() page.goto('https://web-scraping.dev/login?cookies=') page.screenshot(path='blocked-banner-screenshot.png')
For further details on using cookies, refer to our dedicated guide.
Powering Up With ScrapFly
So far, we have explored taking website screenshots using a basic headless browser configuration. However, modern websites prevent screenshot automation using anti-bot measures. Moreover, maintaining headless web browsers can be complex and time-consuming.
ScrapFly is a screenshot API that enables taking web page captures at scale by providing:
- Antibot protection bypass - screenshot web pages on protected domains without being blocked by antibot services like Cloudflare.
- Built-in rotating proxies
- Prevents IP address blocking encountered by rate-limit rules.
- Geolocation targeting access location-restricted domains through an IP address pool of +175 countries.
- JavaScript execution - take full advantage of headless browser automation through scrolling, navigating, clicking buttons, and filling out forms etc.
- Full screenshot customization - controls the webpage screenshot capture behavior by setting its file type, resolution, color mode, viewport, and banners settings.
- Python and Typescript SDKs.
ScrapFly abstracts away all the required engineering efforts!
Here's how to take Python screenshots using ScrapFly's screenshot API. It's as simple as sending an API request:
from pathlib import Path import urllib.parse import requests base_url = 'https://api.scrapfly.io/screenshot?' params = { 'key': 'Your ScrapFly API key', 'url': 'https://web-scraping.dev/products', # web page URL to screenshot 'format': 'png', # screenshot format (file extension) 'capture': 'fullpage', # area to capture (specific element, fullpage, viewport) 'resolution': '1920x1080', # screen resolution 'country': 'us', # proxy country 'rendering_wait': 5000, # time to wait in milliseconds before capturing 'wait_for_selector': 'div.products-wrap', # selector to wait on the web page 'options': [ 'dark_mode', # use the dark mode 'block_banners', # block pop up banners 'print_media_format' # emulate media printing format ], 'auto_scroll': True # automatically scroll down the page } # Convert the list of options to a comma-separated string params['options'] = ','.join(params['options']) query_string = urllib.parse.urlencode(params) full_url = base_url + query_string response = requests.get(full_url) image_bytes = response.content # save to disk Path("screenshot.png").write_bytes(image_bytes)
Try for FREE!
More on Scrapfly
FAQ
To wrap up this guide on taking website screenshots with Python Selenium and Playwright, let's have a look at some frequqntly asked questions.
Are there alternatives to headless browsers for taking Python screenshots?
Yes, screenshot APIs are great alternatives. They manage headless browsers under the hood, enabling website snapshots through simple HTTP requests. For further details, refer to our guide on the best screenshot API.
How to take screenshots in NodeJS?
Puppeteer is a popular headless browser that allows web page captures using the page.screenshot method. For more, refer to our guide on taking screenshots with Puppeteer.
如何用Python截取完整的網頁截圖?
要取得全頁螢幕截圖,請根據需要使用 Selenium 或 Playwright 向下捲動頁面。然後,使用Playwright中的fullpage方法:screenshot(path, full_page=True)自動在全視窗擷取螢幕截圖。
對於Selenium,滾動後手動更新瀏覽器的視口高度以覆蓋整個垂直高度。
概括
在本指南中,我們解釋瞭如何在 Python 中取得 Playwright 和 Selenium 螢幕截圖。我們首先介紹安裝和基本使用。
我們詳細了解如何使用高級 Selenium 和 Playwright 功能來捕獲自訂螢幕截圖:
- 等待固定的 ti 逾時、選擇器和載入狀態
- 模擬瀏覽器首選項、視窗、地理位置、主題、區域設定和時區
- 全頁擷取、選擇定位與橫幅攔截
以上是如何用Python截圖?的詳細內容。更多資訊請關注PHP中文網其他相關文章!

Python列表切片的基本語法是list[start:stop:step]。 1.start是包含的第一個元素索引,2.stop是排除的第一個元素索引,3.step決定元素之間的步長。切片不僅用於提取數據,還可以修改和反轉列表。

ListSoutPerformarRaysin:1)DynamicsizicsizingandFrequentInsertions/刪除,2)儲存的二聚體和3)MemoryFeliceFiceForceforseforsparsedata,butmayhaveslightperformancecostsinclentoperations。

toConvertapythonarraytoalist,usEthelist()constructororageneratorexpression.1)intimpthearraymoduleandcreateanArray.2)USELIST(ARR)或[XFORXINARR] to ConconverTittoalist,請考慮performorefformanceandmemoryfformanceandmemoryfformienceforlargedAtasetset。

choosearraysoverlistsinpythonforbetterperformanceandmemoryfliceSpecificScenarios.1)largenumericaldatasets:arraysreducememoryusage.2)績效 - 臨界雜貨:arraysoffersoffersOffersOffersOffersPoostSfoostSforsssfortasssfortaskslikeappensearch orearch.3)testessenforcety:arraysenforce:arraysenforc

在Python中,可以使用for循環、enumerate和列表推導式遍歷列表;在Java中,可以使用傳統for循環和增強for循環遍歷數組。 1.Python列表遍歷方法包括:for循環、enumerate和列表推導式。 2.Java數組遍歷方法包括:傳統for循環和增強for循環。

本文討論了版本3.10中介紹的Python的新“匹配”語句,該語句與其他語言相同。它增強了代碼的可讀性,並為傳統的if-elif-el提供了性能優勢

Python中的功能註釋將元數據添加到函數中,以進行類型檢查,文檔和IDE支持。它們增強了代碼的可讀性,維護,並且在API開發,數據科學和圖書館創建中至關重要。


熱AI工具

Undresser.AI Undress
人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover
用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool
免費脫衣圖片

Clothoff.io
AI脫衣器

Video Face Swap
使用我們完全免費的人工智慧換臉工具,輕鬆在任何影片中換臉!

熱門文章

熱工具

PhpStorm Mac 版本
最新(2018.2.1 )專業的PHP整合開發工具

EditPlus 中文破解版
體積小,語法高亮,不支援程式碼提示功能

Atom編輯器mac版下載
最受歡迎的的開源編輯器

Dreamweaver CS6
視覺化網頁開發工具

MinGW - Minimalist GNU for Windows
這個專案正在遷移到osdn.net/projects/mingw的過程中,你可以繼續在那裡關注我們。 MinGW:GNU編譯器集合(GCC)的本機Windows移植版本,可自由分發的導入函式庫和用於建置本機Windows應用程式的頭檔;包括對MSVC執行時間的擴展,以支援C99功能。 MinGW的所有軟體都可以在64位元Windows平台上運作。