Home >Backend Development >Python Tutorial >How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?
Extracting Native Resolution Images from PDFs in Python without Resampling
Extracting images from PDFs with their native resolution and format while preserving the layout can be a challenge. However, Python's PyMuPDF module provides a straightforward solution.
Using PyMuPDF
PyMuPDF can output images as PNG files, ensuring high resolution and maintaining the original format (e.g., TIFF, JPEG). The following code demonstrates its usage:
<code class="python">import fitz doc = fitz.open("file.pdf") for i in range(len(doc)): for img in doc.getPageImageList(i): xref = img[0] pix = fitz.Pixmap(doc, xref) if pix.n < 5: # GRAY or RGB pix.writePNG("p%s-%s.png" % (i, xref)) else: # CMYK pix1 = fitz.Pixmap(fitz.csRGB, pix) pix1.writePNG("p%s-%s.png" % (i, xref)) pix1 = None pix = None</code>
Modified Version for fitz 1.19.6
For the latest version of fitz (1.19.6), the following modified code can be used:
<code class="python">import os import fitz from tqdm import tqdm workdir = "your_folder" for each_path in os.listdir(workdir): if ".pdf" in each_path: doc = fitz.Document((os.path.join(workdir, each_path))) for i in tqdm(range(len(doc)), desc="pages"): for img in tqdm(doc.get_page_images(i), desc="page_images"): xref = img[0] image = doc.extract_image(xref) pix = fitz.Pixmap(doc, xref) pix.save(os.path.join(workdir, "%s_p%s-%s.png" % (each_path[:-4], i, xref)))</code>
This modified code utilizes tqdm for progress bar display and optimizes the image extraction and saving process.
The above is the detailed content of How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?. For more information, please follow other related articles on the PHP Chinese website!