Home  >  Article  >  Backend Development  >  How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?

How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?

Susan Sarandon
Susan SarandonOriginal
2024-10-22 07:53:30641browse

How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?

Extracting Native Resolution Images from PDFs in Python without Resampling

Extracting images from PDFs with their native resolution and format while preserving the layout can be a challenge. However, Python's PyMuPDF module provides a straightforward solution.

Using PyMuPDF

PyMuPDF can output images as PNG files, ensuring high resolution and maintaining the original format (e.g., TIFF, JPEG). The following code demonstrates its usage:

<code class="python">import fitz
doc = fitz.open("file.pdf")
for i in range(len(doc)):
    for img in doc.getPageImageList(i):
        xref = img[0]
        pix = fitz.Pixmap(doc, xref)
        if pix.n < 5:       # GRAY or RGB
            pix.writePNG("p%s-%s.png" % (i, xref))
        else:               # CMYK
            pix1 = fitz.Pixmap(fitz.csRGB, pix)
            pix1.writePNG("p%s-%s.png" % (i, xref))
            pix1 = None
        pix = None</code>

Modified Version for fitz 1.19.6

For the latest version of fitz (1.19.6), the following modified code can be used:

<code class="python">import os
import fitz
from tqdm import tqdm

workdir = "your_folder"

for each_path in os.listdir(workdir):
    if ".pdf" in each_path:
        doc = fitz.Document((os.path.join(workdir, each_path)))

        for i in tqdm(range(len(doc)), desc="pages"):
            for img in tqdm(doc.get_page_images(i), desc="page_images"):
                xref = img[0]
                image = doc.extract_image(xref)
                pix = fitz.Pixmap(doc, xref)
                pix.save(os.path.join(workdir, "%s_p%s-%s.png" % (each_path[:-4], i, xref)))</code>

This modified code utilizes tqdm for progress bar display and optimizes the image extraction and saving process.

The above is the detailed content of How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn