如何在Python中以本機解析度和格式從PDF文件中擷取影像？-Python教學-PHP中文網

首頁

後端開發

Python教學

如何在Python中以本機解析度和格式從PDF文件中擷取影像？

Patricia Arquette

Oct 22, 2024 am 07:55 AM

How to Extract Images from PDF Documents with Native Resolution and Format in Python?

以原始解析度和格式從PDF 文件中擷取影像

處理PDF 文件時，可以使用原始解析度和格式擷取影像至關重要的。這可確保擷取的影像保留與來源文件相同的品質和完整性。在本文中，我們提出了一種使用 Python 從 PDF 文件中提取圖像而無需重新採樣的解決方案，使您能夠獲得原始格式的高品質圖像。

用於影像擷取的 PyMuPDF

用於 PDF 操作的最流行的 Python 模組之一是 PyMuPDF。該模組提供了一種從 PDF 文件中提取圖像的強大方法，同時保留其原始解析度和格式。以下是使用 PyMuPDF 的程式碼片段：

<code class="python">import fitz

# Open the PDF document
doc = fitz.open("file.pdf")

# Iterate through pages and images
for i in range(len(doc)):
    for img in doc.getPageImageList(i):
        xref = img[0]

        # Convert picture object to PNG
        pix = fitz.Pixmap(doc, xref)
        if pix.n <p>此程式碼迭代 PDF 文件中的所有頁面和圖像，並將它們提取為 PNG 檔案。它保留了每個影像的原始解析度和格式，確保您獲得高品質的影像。 </p>
<p><strong>更新 PyMuPDF 的修改版本</strong></p>
<p>如果您使用的是較新版本PyMuPDF 版本（例如 1.19.6），您可能需要稍微修改上面的程式碼。以下程式碼片段反映了必要的變更：</p>
<pre class="brush:php;toolbar:false"><code class="python">import os
import fitz
from tqdm import tqdm

# Set working directory
workdir = "your_folder"

# Process PDF files in the directory
for each_path in os.listdir(workdir):
    if ".pdf" in each_path:
        # Open the PDF document
        doc = fitz.Document((os.path.join(workdir, each_path)))

        # Iterate through pages and images
        for i in tqdm(range(len(doc)), desc="pages"):
            for img in tqdm(doc.get_page_images(i), desc="page_images"):
                xref = img[0]

                # Extract the image and save it as PNG
                image = doc.extract_image(xref)
                pix = fitz.Pixmap(doc, xref)
                pix.save(os.path.join(workdir, "%s_p%s-%s.png" % (each_path[:-4], i, xref)))

# Print a completion message
print("Done!")</code>

此修改後的程式碼使用 get_page_images() 方法取得映像並將其儲存為指定工作目錄中的 PNG 檔案。

以上是如何在Python中以本機解析度和格式從PDF文件中擷取影像？的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

為什麼數組通常比存儲數值數據列表更高？May 05, 2025 am 12:15 AM

ArraySareAryallyMoremory-Moremory-forigationDataDatueTotheIrfixed-SizenatureAntatureAntatureAndirectMemoryAccess.1）arraysStorelelementsInAcontiguxufulock，ReducingOveringOverheadHeadefromenterSormetormetAdata.2）列表，通常

如何將Python列表轉換為Python陣列？May 05, 2025 am 12:10 AM

ToconvertaPythonlisttoanarray,usethearraymodule:1)Importthearraymodule,2)Createalist,3)Usearray(typecode,list)toconvertit,specifyingthetypecodelike'i'forintegers.Thisconversionoptimizesmemoryusageforhomogeneousdata,enhancingperformanceinnumericalcomp

您可以將不同的數據類型存儲在同一Python列表中嗎？舉一個例子。May 05, 2025 am 12:10 AM

Python列表可以存儲不同類型的數據。示例列表包含整數、字符串、浮點數、布爾值、嵌套列表和字典。列表的靈活性在數據處理和原型設計中很有價值，但需謹慎使用以確保代碼的可讀性和可維護性。

Python中的數組和列表之間有什麼區別？May 05, 2025 am 12:06 AM

Pythondoesnothavebuilt-inarrays;usethearraymoduleformemory-efficienthomogeneousdatastorage,whilelistsareversatileformixeddatatypes.Arraysareefficientforlargedatasetsofthesametype,whereaslistsofferflexibilityandareeasiertouseformixedorsmallerdatasets.

通常使用哪種模塊在Python中創建數組？May 05, 2025 am 12:02 AM

theSostCommonlyusedModuleForCreatingArraysInpyThonisnumpy.1）NumpyProvidEseffitedToolsForarrayOperations，Idealfornumericaldata.2）arraysCanbeCreatedDusingsnp.Array（）for1dand2Structures.3）

您如何將元素附加到Python列表中？May 04, 2025 am 12:17 AM

toAppendElementStoApythonList，usetheappend（）方法forsingleements，Extend（）formultiplelements，andinsert（）forspecificpositions.1）useeAppend（）foraddingoneOnelementAttheend.2）useextendTheEnd.2）useextendexendExendEnd（

您如何創建Python列表？舉一個例子。May 04, 2025 am 12:16 AM

TocreateaPythonlist,usesquarebrackets[]andseparateitemswithcommas.1)Listsaredynamicandcanholdmixeddatatypes.2)Useappend(),remove(),andslicingformanipulation.3)Listcomprehensionsareefficientforcreatinglists.4)Becautiouswithlistreferences;usecopy()orsl

討論有效存儲和數值數據的處理至關重要的實際用例。May 04, 2025 am 12:11 AM

金融、科研、医疗和AI等领域中，高效存储和处理数值数据至关重要。1)在金融中，使用内存映射文件和NumPy库可显著提升数据处理速度。2)科研领域，HDF5文件优化数据存储和检索。3)医疗中，数据库优化技术如索引和分区提高数据查询性能。4)AI中，数据分片和分布式训练加速模型训练。通过选择适当的工具和技术，并权衡存储与处理速度之间的trade-off，可以显著提升系统性能和可扩展性。

See all articles