无头浏览器自动化库提供了广泛的配置选项,可用于截取屏幕截图。在本指南中,我们将解释如何通过 Selenium 和 Playwright 获取 Python 屏幕截图。然后,我们将探讨自定义网页捕获的常见浏览器提示和技巧。让我们开始吧!
基本截图功能
在本指南中,我们将首先介绍核心 Selenium 和 Playwright 方法,包括拍摄 Python 屏幕截图所需的安装。然后,我们将探索常见功能来获取定制的 Selenium 和 Playwright 屏幕截图。
硒截图
在探索如何在 Python 中使用 Selenium 进行屏幕截图之前,让我们先安装它。使用以下 pip 命令与 webdriver-manager 一起安装 Selenium:
pip install selenium webdriver-manager
我们将使用 webdriver-manager Python 库自动下载所需的浏览器驱动程序:
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
现在所需的安装已准备就绪,让我们使用selenium python 进行截图:
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install())) # request target web page driver.get("https://web-scraping.dev/products") # take sceenshot and directly save it driver.save_screenshot('products.png') # image as bytes bytes = driver.get_screenshot_as_png() # image as base64 string base64_string = driver.get_screenshot_as_base64()
上面用于获取 Selenium Python 屏幕截图的 Python 脚本相当简单。我们使用 save_screenshot 方法截取完整驱动程序视口的屏幕截图并将图像文件保存到 products.png 文件中。除了直接保存到磁盘之外,还可以使用其他方法将纯图像数据保存为二进制或base64以进行进一步处理。
有关 Selenium 的更多详细信息,请参阅我们的专用指南。
剧作家截图
Playwright API 可用于不同的编程语言。因为我们将使用 Python Playwright 进行屏幕截图。使用以下 pip 命令安装其 Python 包:
pip install playwright
接下来,安装所需的 Playwright Web diver 二进制文件:
playwright install chromium # alternatively install `firefox` or `webkit`
要截取 Playwright 屏幕截图,我们可以使用 .screenshot 方法:
from pathlib import Path from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() page = context.new_page() # request target web page page.goto('https://web-scraping.dev/products') # take sceenshot and directly save it page.screenshot(path="products.png") # or screenshot as bytes image_bytes = page.screenshot() Path("products.png").write_bytes(image_bytes)
在上面,我们首先启动一个新的 Playwright 无头浏览器实例,然后在其中打开一个新选项卡。然后,我们使用屏幕截图并将其保存到产品PNG文件中。
请参阅我们的 Playwright 专用指南,了解有关使用它进行网页抓取的更多详细信息。
等待和超时
网页上的图像是动态加载的。因此,正确等待它们加载对于防止网站屏幕截图损坏至关重要。让我们探索定义等待和超时的不同技术。
固定超时
固定超时是无头浏览器等待功能的最基本类型。通过在捕获屏幕截图之前等待一定时间,我们确保所有 DOM 元素都已正确加载。
硒:
import time # .... driver.get("https://web-scraping.dev/products") # wait for 10 seconds before taking screenshot time.sleep(10) driver.save_screenshot('products.png')
剧作家
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for 10 seconds before taking screenshot page.wait_for_timeout(10000) page.screenshot(path="products.png")
上面,我们使用 Playwright 的 wait_for_timeout 方法来定义一个固定的等待条件,然后再进行网页截图。由于 Selenium 没有提供固定超时的内置方法,因此我们使用 Python 的内置时间模块。
选择器
动态等待条件涉及等待特定元素的选择器在页面上变得可见,然后再继续。如果在定义的超时时间内找到选择器,则等待过程终止。
硒
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions from selenium.webdriver.common.by import By # .... driver.get("https://web-scraping.dev/products") _timeout = 10 # set the maximum timeout to 10 seconds wait = WebDriverWait(driver, _timeout) wait.until(expected_conditions.presence_of_element_located( (By.XPATH, "//div[@class='products']") # wait for XPath selector # (By.CSS_SELECTOR, "div.products") # wait for CSS selector ) ) driver.save_screenshot("products.png")
剧作家
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for XPath or CSS selector page.wait_for_selector("div.products", timeout=10000) page.wait_for_selector("//div[@class='products']", timeout=10000) page.screenshot(path="products.png")
上面,我们利用 Selenium 的 Expected_conditions 和 Playwright 的 wait_for_selector 方法利用动态条件等待选择器,然后再进行 Python 屏幕截图。
负载状态
最后一个可用的等待条件是加载状态。它等待浏览器页面达到特定状态 :
- domcontentloaded:等待完整的 DOM 树加载
- networkidle: 等待至少 500 毫秒没有网络连接
在捕获具有多个要渲染的图像的网页快照时,等待网络空闲状态特别有用。由于这些页面占用大量带宽,因此等待所有网络调用完成比等待特定选择器更容易。
以下是如何利用 waitForLoadState 方法在获取 Playwright 屏幕截图之前等待:
# .... with sync_playwright() as p: # .... page.goto('https://web-scraping.dev/products') # wait for load state page.wait_for_load_state("domcontentloaded") # DOM tree to load page.wait_for_load_state("networkidle") # network to be idle page.screenshot(path="products.png")
请注意,Selenium 没有可用的方法来拦截驱动程序的加载状态,但可以使用自定义 JavaScript 执行来实现。
仿真
仿真可以自定义无头浏览器配置,以模拟常见网络浏览器的类型和用户首选项。这些设置反映相应的网页屏幕截图。
For instance, by emulating a specific phone browser, the website screenshot taken appears as if it was captured by an actual phone.
Viewport
Viewport settings represent the resolution of the browser device through width and height dimensions. Here's how to change Python screenshot viewport.
Selenium
# .... # set the viewport dimensions (width x height) driver.set_window_size(1920, 1080) driver.get("https://web-scraping.dev/products") driver.save_screenshot("products.png")
Playwright:
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( viewport={"width": 1920, "height": 1080}, # set viewport dimensions device_scale_factor=2, # increase the pixel ratio ) page = context.new_page() page.goto("https://web-scraping.dev/products") page.screenshot(path="products.png")
Here, we use Selenium's set_window_size method to set the browser viewport. As for Playwright, we define a browser context to set the viewport in addition to increasing the pixel ratio rate through the the device_scale_factor property for higher quality.
Playwrights provides a wide range of device presets to emulate multiple browsers and operating systems, enabling further Playwright screenshot customization:
# .... with sync_playwright() as p: iphone_14 = p.devices['iPhone 14 Pro Max'] browser = p.webkit.launch(headless=False) context = browser.new_context( **iphone_14, ) # open a browser tab with iPhone 14 Pro Max's device profile page = context.new_page() # ....
The above Python script selects a device profile to automatically define its settings, including UserAgent, screen viewport, scale factor, and browser type. For the full list of available device profiles, refer to the official device registry.
Locale and Timezone
Taking a screenshot on websites with localization features can make the images look different based on the locale language and timezone settings used. Hence, corretly setting these values ensures the correct behavior.
Selenium
from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.options import Options driver_manager = ChromeService(ChromeDriverManager().install()) options = Options() # set locale options.add_argument("--lang=fr-FR") driver = webdriver.Chrome(service=driver_manager, options=options) # set timezone using devtools protocol timezone = {'timezoneId': 'Europe/Paris'} driver.execute_cdp_cmd('Emulation.setTimezoneOverride', timezone) driver.get("https://webbrowsertools.com/timezone/") # ....
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( locale='fr-FR', timezone_id='Europe/Paris', ) page = context.new_page() page.goto("https://webbrowsertools.com/timezone/") # ....
In this Python code, we set the browser's localization preferences through the locale and timezone settings. However, other factors can affect the localization profile used. For the full details, refer to our dedicated guide on web scraping localization.
Geolocation
Taking Python screenshots on websites can often be affected by automatic browser location identification. Here's how we can change it through longitude and latitude values.
Selenium
# .... driver_manager = ChromeService(ChromeDriverManager().install()) driver = webdriver.Chrome(service=driver_manager) geolocation = dict( { "latitude": 37.17634, "longitude": -3.58821, "accuracy": 100 } ) # set geolocation using devtools protocol driver.execute_cdp_cmd("Emulation.setGeolocationOverride", geolocation)
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( geolocation={"longitude": 37.17634, "latitude": -3.58821}, permissions=["geolocation"] ) page = context.new_page()
Dark Mode
Taking webpage screenshots in the dark mode is quite popular. To approach it, we can change the browser's default color theme preference.
Selenium
# .... options = Options() options.add_argument('--force-dark-mode') driver_manager = ChromeService(ChromeDriverManager().install()) driver = webdriver.Chrome(service=driver_manager, options=options) driver.get("https://reddit.com/") # will open in dark mode
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( color_scheme='dark' ) page = context.new_page() page.goto("https://reddit.com/") # will open in dark mode
The above code sets the default browser theme to dark mode, enabling dark-mode web screenshots accordingly. However, it has no effect on websites without native theme-modification support.
Pro Tip: Force Dark Mode
To force dark-mode screenshots across all websites, we can use Chrome flags. To do this, start by retrieving the required argument using the below steps:
- Open the available chrome flags from the address chrome://flags/
- Search for the enable-force-dark flag and enable it with selective inversion of everything
- Relaunch the browser
- Go to chrome://version/ and copy the created flag argument from the command line property
After retrieving the flag argument, add it to the browser context to force dark website screenshots in Python.
Selenium
# .... driver_manager = ChromeService(ChromeDriverManager().install()) options = Options() options.add_argument('--enable-features=WebContentsForceDark:inversion_method/cielab_based') driver = webdriver.Chrome(service=driver_manager, options=options) driver.get("https://web-scraping.dev/products") driver.save_screenshot('dark_screenshot.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch( headless=False, args=[ '--enable-features=WebContentsForceDark:inversion_method/cielab_based' ] ) context = browser.new_context() page = context.new_page() page.goto("https://web-scraping.dev/products") page.screenshot(path="dark_screenshot.png")
Here's what the retrieved dark-mode Python screenshot looks like:
Selection Targeting
Lastly, let's explore using Python to screenshot webpages through area selection. It enables targeting specific areas of the page.
Full Page
Taking full-page screenshots is an extremely popular use case, allowing snapshots to be captured at the whole page's vertical height.
Full-page screenshots are often _ misunderstood _. Hence, it's important to differentiate between two distinct concepts:
- Screenshot viewport, the image dimensions as height and width.
- Browser scrolling, whether the driver has scrolled down to load more pages.
A headless browser can scroll down, but its screenshot height hasn't been updated for the new height , or vice versa. Hence, the retrieved web snapshot doesn't look as expected.
Here's how to take scrolling screenshots with Selenium and Playwright.
Selenium
# .... def scroll(driver): _prev_height = -1 _max_scrolls = 100 _scroll_count = 0 while _scroll_count <p>Playwright<br> </p> <pre class="brush:php;toolbar:false"># .... def scroll(page): _prev_height = -1 _max_scrolls = 100 _scroll_count = 0 while _scroll_count <p>Since Selenium doesn't provide automatic full page screenshot capturing capabilities, we utilize additional steps:</p>
- Get the new page height after scrolling and use it to update the viewport.
- Find the body element of the page and target it with a screenshot.
Selectors
So far, we have been taking web page screenshots against the entire screen viewport. However, headless browsers allow targeting a specific area by screenshotting elements using their equivalent selectors :
Selenium
# .... driver.set_window_size(1920, 1080) driver.get("https://web-scraping.dev/product/3") # wait for the target element to be visible wait = WebDriverWait(driver, 10) wait.until(expected_conditions.presence_of_element_located( (By.CSS_SELECTOR, "div.row.product-data") )) element = driver.find_element(By.CSS_SELECTOR, 'div.row.product-data') # take web page screenshot of the specific element element.screenshot('product-data.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context( viewport={"width": 1920, "height": 1080} ) page = context.new_page() page.goto('https://web-scraping.dev/product/3') # wait for the target element to be visible page.wait_for_selector('div.row.product-data') # take web page screenshot of the specific element page.locator('div.row.product-data').screenshot(path="product-data.png")
In the above code, we start by waiting for the desired element to appear in the HTML. Then, we select it and specifically capture it. Here's what's the retrieved Python screenshot looks like:
Coordinates
Furthermore, we can customize the webpage Python screenshots using coordinate values. In other words, it crops the web page into an image using four attributes :
- X-coordinate of the clip area's horizontal position (left to right)
- Y-coordinate of the clip area's vertical position (top to bottom)
- Width and height dimensions
Here's how to take clipped Playwright and Selenium screenshots:
Selenium
from PIL import Image # pip install pillow from io import BytesIO # .... driver.get("https://web-scraping.dev/product/3") wait = WebDriverWait(driver, 10) wait.until(expected_conditions.presence_of_element_located( (By.CSS_SELECTOR, "div.row.product-data") )) element = driver.find_element(By.CSS_SELECTOR, 'div.row.product-data') # automatically retrieve the coordinate values of selected selector location = element.location size = element.size coordinates = { "x": location['x'], "y": location['y'], "width": size['width'], "height": size['height'] } print(coordinates) {'x': 320.5, 'y': 215.59375, 'width': 1262, 'height': 501.828125} # capture full driver screenshot screenshot_bytes = driver.get_screenshot_as_png() # clip the screenshot and save it img = Image.open(BytesIO(screenshot_bytes)) clip_box = (coordinates['x'], coordinates['y'], coordinates['x'] + coordinates['width'], coordinates['y'] + coordinates['height']) cropped_img = img.crop(clip_box) cropped_img.save('clipped-screenshot.png')
Playwright
# .... with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() page = context.new_page() page.goto('https://web-scraping.dev/product/3') page.wait_for_selector('div.row.product-data') element = page.query_selector("div.row.product-data") # automatically retrieve the coordinate values of selected selector coordinates = element.bounding_box() print(coordinates) {'x': 320.5, 'y': 215.59375, 'width': 1262, 'height': 501.828125} # capture the screenshot with clipping page.screenshot(path="clipped-screenshot.png", clip=coordinates)
We use Playwright's built-in clip method to automatically crop the captured screenshot. As for Selenium, we use Pillow to manually clip the full web page snapshot.
Banner Blocking
Websites' pop-up banners prevent taking clear screenshots. One of these is the famous " Accept Cookies" banner on web-scraping.dev as an example:
The above banner is displayed through cookies. If we click "accept", a cookie value will be saved on the browser to save our reference and prevent displaying the banner again.
If we observe observe browser developer tools
, we'll find the cookiesAccepted cookie set to true. So, to block cookie banners while taking Python screenshots, we'll set this cookie before navigating to the target web page.
Selenium
# .... driver.get("https://web-scraping.dev") # add the cookie responsible for blocking screenshot banners driver.add_cookie({'name': 'cookiesAccepted', 'value': 'true', 'domain': 'web-scraping.dev'}) driver.get("https://web-scraping.dev/login?cookies=") driver.save_screenshot('blocked-banner-screenshot.png'
Playwright
with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() # add the cookie responsible for blocking screenshot banners context.add_cookies( [{'name': 'cookiesAccepted', 'value': 'true', 'domain': 'web-scraping.dev', 'path': '/'}] ) page = context.new_page() page.goto('https://web-scraping.dev/login?cookies=') page.screenshot(path='blocked-banner-screenshot.png')
For further details on using cookies, refer to our dedicated guide.
Powering Up With ScrapFly
So far, we have explored taking website screenshots using a basic headless browser configuration. However, modern websites prevent screenshot automation using anti-bot measures. Moreover, maintaining headless web browsers can be complex and time-consuming.
ScrapFly is a screenshot API that enables taking web page captures at scale by providing:
- Antibot protection bypass - screenshot web pages on protected domains without being blocked by antibot services like Cloudflare.
- Built-in rotating proxies
- Prevents IP address blocking encountered by rate-limit rules.
- Geolocation targeting access location-restricted domains through an IP address pool of +175 countries.
- JavaScript execution - take full advantage of headless browser automation through scrolling, navigating, clicking buttons, and filling out forms etc.
- Full screenshot customization - controls the webpage screenshot capture behavior by setting its file type, resolution, color mode, viewport, and banners settings.
- Python and Typescript SDKs.
ScrapFly abstracts away all the required engineering efforts!
Here's how to take Python screenshots using ScrapFly's screenshot API. It's as simple as sending an API request:
from pathlib import Path import urllib.parse import requests base_url = 'https://api.scrapfly.io/screenshot?' params = { 'key': 'Your ScrapFly API key', 'url': 'https://web-scraping.dev/products', # web page URL to screenshot 'format': 'png', # screenshot format (file extension) 'capture': 'fullpage', # area to capture (specific element, fullpage, viewport) 'resolution': '1920x1080', # screen resolution 'country': 'us', # proxy country 'rendering_wait': 5000, # time to wait in milliseconds before capturing 'wait_for_selector': 'div.products-wrap', # selector to wait on the web page 'options': [ 'dark_mode', # use the dark mode 'block_banners', # block pop up banners 'print_media_format' # emulate media printing format ], 'auto_scroll': True # automatically scroll down the page } # Convert the list of options to a comma-separated string params['options'] = ','.join(params['options']) query_string = urllib.parse.urlencode(params) full_url = base_url + query_string response = requests.get(full_url) image_bytes = response.content # save to disk Path("screenshot.png").write_bytes(image_bytes)
Try for FREE!
More on Scrapfly
FAQ
To wrap up this guide on taking website screenshots with Python Selenium and Playwright, let's have a look at some frequqntly asked questions.
Are there alternatives to headless browsers for taking Python screenshots?
Yes, screenshot APIs are great alternatives. They manage headless browsers under the hood, enabling website snapshots through simple HTTP requests. For further details, refer to our guide on the best screenshot API.
How to take screenshots in NodeJS?
Puppeteer is a popular headless browser that allows web page captures using the page.screenshot method. For more, refer to our guide on taking screenshots with Puppeteer.
如何用Python截取完整的网页截图?
要获取全页屏幕截图,请根据需要使用 Selenium 或 Playwright 向下滚动页面。然后,使用Playwright中的fullpage方法:screenshot(path, full_page=True) 即可在全视口自动截图。
对于Selenium,滚动后手动更新浏览器的视口高度以覆盖整个垂直高度。
概括
在本指南中,我们解释了如何在 Python 中获取 Playwright 和 Selenium 屏幕截图。我们首先介绍安装和基本使用。
我们详细了解了如何使用高级 Selenium 和 Playwright 功能来捕获自定义屏幕截图:
- 等待固定的 ti 超时、选择器和加载状态
- 模拟浏览器首选项、视口、地理位置、主题、区域设置和时区
- 全页捕获、选择定位和横幅拦截
以上是如何用Python截屏?的详细内容。更多信息请关注PHP中文网其他相关文章!

Python列表切片的基本语法是list[start:stop:step]。1.start是包含的第一个元素索引,2.stop是排除的第一个元素索引,3.step决定元素之间的步长。切片不仅用于提取数据,还可以修改和反转列表。

ListSoutPerformarRaysin:1)DynamicsizicsizingandFrequentInsertions/删除,2)储存的二聚体和3)MemoryFeliceFiceForceforseforsparsedata,butmayhaveslightperformancecostsinclentoperations。

toConvertapythonarraytoalist,usEthelist()constructororageneratorexpression.1)intimpthearraymoduleandcreateanArray.2)USELIST(ARR)或[XFORXINARR] to ConconverTittoalist,请考虑performorefformanceandmemoryfformanceandmemoryfformienceforlargedAtasetset。

choosearraysoverlistsinpythonforbetterperformanceandmemoryfliceSpecificScenarios.1)largenumericaldatasets:arraysreducememoryusage.2)绩效 - 临界杂货:arraysoffersoffersOffersOffersOffersPoostSfoostSforsssfortasssfortaskslikeappensearch orearch.3)testessenforcety:arraysenforce:arraysenforc

在Python中,可以使用for循环、enumerate和列表推导式遍历列表;在Java中,可以使用传统for循环和增强for循环遍历数组。1.Python列表遍历方法包括:for循环、enumerate和列表推导式。2.Java数组遍历方法包括:传统for循环和增强for循环。

本文讨论了Python版本3.10中介绍的新“匹配”语句,该语句与其他语言相同。它增强了代码的可读性,并为传统的if-elif-el提供了性能优势

Python中的功能注释将元数据添加到函数中,以进行类型检查,文档和IDE支持。它们增强了代码的可读性,维护,并且在API开发,数据科学和图书馆创建中至关重要。


热AI工具

Undresser.AI Undress
人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover
用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool
免费脱衣服图片

Clothoff.io
AI脱衣机

Video Face Swap
使用我们完全免费的人工智能换脸工具轻松在任何视频中换脸!

热门文章

热工具

VSCode Windows 64位 下载
微软推出的免费、功能强大的一款IDE编辑器

SecLists
SecLists是最终安全测试人员的伙伴。它是一个包含各种类型列表的集合,这些列表在安全评估过程中经常使用,都在一个地方。SecLists通过方便地提供安全测试人员可能需要的所有列表,帮助提高安全测试的效率和生产力。列表类型包括用户名、密码、URL、模糊测试有效载荷、敏感数据模式、Web shell等等。测试人员只需将此存储库拉到新的测试机上,他就可以访问到所需的每种类型的列表。

DVWA
Damn Vulnerable Web App (DVWA) 是一个PHP/MySQL的Web应用程序,非常容易受到攻击。它的主要目标是成为安全专业人员在合法环境中测试自己的技能和工具的辅助工具,帮助Web开发人员更好地理解保护Web应用程序的过程,并帮助教师/学生在课堂环境中教授/学习Web应用程序安全。DVWA的目标是通过简单直接的界面练习一些最常见的Web漏洞,难度各不相同。请注意,该软件中

SublimeText3汉化版
中文版,非常好用

安全考试浏览器
Safe Exam Browser是一个安全的浏览器环境,用于安全地进行在线考试。该软件将任何计算机变成一个安全的工作站。它控制对任何实用工具的访问,并防止学生使用未经授权的资源。