search
HomeBackend DevelopmentPython TutorialBasic method of decompressing file formats in python

Python library that handles multiple compression package formats: patool. If you only use basic decompression, packaging and other operations, and don't want to learn more about the python libraries corresponding to various compression formats, patool should be a good choice.

Related recommendations: "python video"

Basic method of decompressing file formats in python The formats supported by the patool library include:

7z (. 7z, .cb7), ACE (.ace, .cba), ADF (.adf), ALZIP (.alz), APE (.ape), AR (.a), ARC (.arc), ARJ (.arj) , BZIP2 (.bz2), CAB (.cab), COMPRESS (.Z), CPIO (.cpio), DEB (.deb), DMS (.dms), FLAC (.flac), GZIP (.gz), ISO (.iso), LRZIP (.lrz), LZH (.lha, .lzh), LZIP (.lz), LZMA (.lzma), LZOP (.lzo), RPM (.rpm), RAR (.rar, . cbr), RZIP (.rz), SHN (.shn), TAR (.tar, .cbt), XZ (.xz), ZIP (.zip, .jar, .cbz) and ZOO (.zoo)

Basic usage of patool:

import patoolib
# 解压缩
patoolib.extract_archive("archive.zip", outdir="/tmp")
# 测试压缩包是否完整
patoolib.test_archive("dist.tar.gz", verbosity=1)
# 列出压缩包内的文件
patoolib.list_archive("package.deb")
# 创建压缩包
patoolib.create_archive("/path/to/myfiles.zip", ("file1.txt", "dir/"))
# 比较压缩包内文件的差异
patoolib.diff_archives("release1.0.tar.gz", "release2.0.zip")
# 搜索patoolib.search_archive("def urlopen", "python3.3.tar.gz")
# 修改压缩包的压缩格式
patoolib.repack_archive("linux-2.6.33.tar.gz", "linux-2.6.33.tar.bz2")

However, the normal operation of patool depends on other decompression software. For example, when I usually use patool to decompress files, it mainly calls my computer. For the two programs 7z and Rtools, if there is no software on the computer that can process the corresponding compressed files, an error will be reported:

patoolib.util.PatoolError: could not find an executable program to extract format rar; candidates are (rar,unrar,7z)

In addition, patool cannot process password-protected compressed files.
Libraries similar to patool include pyunpack and easy-extract: the pyunpack library relies on zipfile and patool, supports all compression formats supported by the two libraries, and needs to be installed in advance; the easy-extract library relies on the decompression software unrar, 7z, and par2. It needs to be installed in advance and also supports a variety of decompression formats.

Processing of common compression formats

If the corresponding compression software is not installed on the computer and you just want to use python for compression and decompression operations, you can use the other details below Introducing several common

zip formats

Python libraries that can handle zip format include python standard library zipfile, and third-party library python-archive, etc. The following are mainly introduced Let’s take a look at the basic usage of the zipfile library:
First create a ZipFile object:

# 导入ZipFile类
from zipfile import ZipFile
# ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, compresslevel=None)
# 默认模式是读取,该模式提供read(), readline(), readlines(), __iter__(), __next__()等方法

Decompress the file package. There are two decompression functions: extract() and extractall(). The former can decompress a single file and decompresses by default. to the current directory. The latter can decompress multiple files in batches and decompress all files by default. Both extract() and extractall() have parameter pwd and can process compressed packages with passwords.

with ZipFile('test.zip') as myzip:
    myzip.extract(member='1.txt',path='tmp')
    myzip.extractall(path='tmp',members=['1.txt','2.txt'],pwd='password')

Make compressed files: zipfile has four methods for compressing files: zipfile.ZIP_STORED (default), zipfile.ZIP_DEFLATED, zipfile.ZIP_BZIP2, zipfile.ZIP_LZMA

# 添加文件的mode有'w', 'a', 'x'
# 'w'表示覆盖或写入一个新文件;'a'表示在已有文件后追加;'x'表示新建文件并写入。
# 在三种mode下,如果未写入认识数据,则会生成空的ZIP文件。
with ZipFile('test.zip',mode='w') as myzip:    
    for file in ['1.txt', '2.txt']: # 需压缩的文件列表        
        myzip.write(file,compress_type=zipfile.ZIP_DEFLATED)

Compress the entire file Folder

# 方法一
def addToZip(zf, path, zippath):
    if os.path.isfile(path):        
        zf.write(path, zippath, zipfile.ZIP_DEFLATED)  # 以zlib压缩方法写入文件    
    elif os.path.isdir(path):        
        if zippath:            
            zf.write(path, zippath)        
        for nm in os.listdir(path):            
            addToZip(zf, os.path.join(path, nm), os.path.join(zippath, nm))
with zipfile.ZipFile('tmp4.zip', 'w') as zip_file:    
      addToZip(zip_file,'tmp','tmp')    
#方法二
class ZipFolder:    
    def toZip(self, file, zipfilename):        
        # 首先创建zipfile对象        
        with zipfile.ZipFile(zipfilename, 'w') as zip_file:            
            if os.path.isfile(file):  # 判断写入的是文件还是文件夹,是文件的话直接写入                
                zip_file.write(file)            
            else:  # 否则调用写入文件夹的函数assFolderToZip()                
                self.addFolderToZip(zip_file, file)    
    def addFolderToZip(self, zip_file, folder):        
        for file in os.listdir(folder):  # 依次遍历文件夹内的文件            
            full_path = os.path.join(folder, file)            
            if os.path.isfile(full_path): # 判断是文件还是文件夹,是文件的话直接写入                
                print('File added: ', str(full_path))                
                zip_file.write(full_path)            
            elif os.path.isdir(full_path):             
            # 如果是文件夹的话再次调用addFolderToZip函数,写入文件夹                
                print('Entering folder: ', str(full_path))                
                self.addFolderToZip(zip_file, full_path)
directory = 'tmp'   # 需压缩的文件目录
zipfilename = 'tmp1.zip'    #压缩后的文件名
ZipFolder().toZip(directory, zipfilename)

rar format

rar format does not have a corresponding python standard library and needs to rely on third-party libraries rarfile, python-unrar, pyUnRAR2, etc. The above libraries have something in common It depends on the support of RARLAB's UnRAR library. The following mainly introduces the rarfile library:

Installation and configuration
Installation command:

pip install rarfile

But the configuration is quite expensive some time. First you need to download and install UnRAR. Because my computer operating system is Windows, I just go to the RARLAB official website to download UnRarDLL and install it to the default path C:\Program Files (x86)\UnrarDLL.
Then add environment variables. First, add C:\Program Files (x86)\UnrarDLL\x64 (my system is 64-bit) to the Path variable in the system variables (right-click on computer>Properties>Advanced system settings >Advanced>Environment Variables), but the error is still reported after restarting PyCharm:

LookupError: Couldn't find path to unrar library.

Then try to create a new variable in the system variables, enter ?UNRAR_LIB_PATH for the variable name, and the variable value is ?C:\Program Files (x86) \UnrarDLL\x64\UnRAR64.dll (the variable value under 32-bit systems is C:\Program Files (x86)\UnrarDLL\UnRAR.dll). Restart PyCharm and the problem is solved.

Basic usage

The usage of rarfile library is very similar to zipfile, and also includes extract(), extractall(), namelist(), infolist(), getinfo (), open(), read(), printdir() and other functions, the main difference is that the RarFile object only supports reading mode and cannot write files.

# mode的值只能为'r'
class rarfile.RarFile(rarfile, mode='r', charset=None, info_callback=None, crc_check=True, errors='stop')

Using the rarfile library to decompress rar compressed packages is the same as using the zipfile library to decompress zip format compressed packages. Please refer to the usage of the zipfile library.

In addition, the installation, setup and use of the python-unrar library are very similar to the rarfile library, but the python-unrar library does not support the with statement. If you want to use the with statement, you can go to the python-unrar library installation directory. Add the following statement to the rarfile.py file:

def __enter__(self):
    """Open context."""    
    return self
def __exit__(self, typ, value, traceback):    
    """Exit context"""    
    self.close()
def close(self):    
    """Release open resources."""    
    pass

tar format

tar format is a common packaging file format under Unix systems and can be matched with different compression methods. Form different compressed file formats, such as: .tar.gz (.tgz), .tar.bz2 (.tbztb2), .tar.Z (.taz), .tar.lzma (.tlz), .tar.xz ( .txz) etc. The tar format corresponds to the python standard library tarfile. The supported formats include: tar, tar.gz, tar.bz2, tar.xz, .tar.lzma, etc.
Basic usage of the tarfile library:

Create tarfile objects

The tarfile library creates objects using tarfile.open() instead of tarfile.TarFile().

tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)

其中,mode可取的值比较多,主要包括'r', 'w', 'a', 'x'四种模式(在zipfile库的使用中简单介绍过),以及这四种模式与'gz', 'bz2', 'xz'三种压缩方法的组合模式,具体取值如下表所示:

模式                                        含义

'r'or'r:*'                 自动解压并打开文件(推荐模式)    

'r:'                         只打开文件不解压    

'r:gz'                     采用gzip格式解压并打开文件    

'r:bz2'                   采用bz2格式解压并打开文件    

'r:xz'                     采用lzma格式解压并打开文件    

'x'or'x:'                 仅创建打包文件,不压缩    

'x:gz'                    采用gzip方式压缩并打包文件    

'x:bz2'                  采用bzip2方式压缩并打包文件    

'x:xz'                     采用lzma方式压缩并打包文件    

'a'or'a:'                 打开文件,并以不压缩的方式追加内容。如果文件不存在,则新建    

'w'or'w:'                以不压缩的方式写入    

'w:gz'                    以gzip的方式压缩并写入    

'w:bz2'                  以bzip2的方式压缩并写入    

'w:xz'                    以lzma的方式压缩并写入    

但是,不支持'a'与三种压缩方法的组合模式('a:gz', 'a:bz2'、'a:xz')

基本使用方法
解压缩至指定的目录

with tarfile.open("test.tar.gz") as tar:    
    tar.extractall(path='.')

解压符合某些条件的文件

# 解压后缀名为py的文件
def py_files(members):
    for tarinfo in members:        
        if os.path.splitext(tarinfo.name)[1] == ".py":            
            yield tarinfo
with tarfile.open("sample.tar.gz") as tar:    
    tar.extractall(members=py_files(tar))

创建不压缩的打包文件

with tarfile.open("sample.tar", "w") as tar:
    for name in ["foo", "bar", "quux"]:        
        tar.add(name)

创建压缩的打包文件

with tarfile.open("sample.tar", "w:gz") as tar:
    for name in ["foo", "bar", "quux"]:        
        tar.add(name)

压缩并打包整个文件夹,较之zipfile库简单得多,可使用add()函数进行添加

tar = tarfile.open('test.tar','w:gz')
for root ,dir,files in os.walk(os.getcwd()):      
  for file in files:          
      fullpath = os.path.join(root,file)          
      tar.add(fullpath)

其他压缩格式

Python原生的数据压缩打包的标准库还包括:bz2、gzip、zlib、lzma以及建立在zipfile和tarfile库基础上的shutil库,以后有机会再详细介绍。

The above is the detailed content of Basic method of decompressing file formats in python. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Are Python lists dynamic arrays or linked lists under the hood?Are Python lists dynamic arrays or linked lists under the hood?May 07, 2025 am 12:16 AM

Pythonlistsareimplementedasdynamicarrays,notlinkedlists.1)Theyarestoredincontiguousmemoryblocks,whichmayrequirereallocationwhenappendingitems,impactingperformance.2)Linkedlistswouldofferefficientinsertions/deletionsbutslowerindexedaccess,leadingPytho

How do you remove elements from a Python list?How do you remove elements from a Python list?May 07, 2025 am 12:15 AM

Pythonoffersfourmainmethodstoremoveelementsfromalist:1)remove(value)removesthefirstoccurrenceofavalue,2)pop(index)removesandreturnsanelementataspecifiedindex,3)delstatementremoveselementsbyindexorslice,and4)clear()removesallitemsfromthelist.Eachmetho

What should you check if you get a 'Permission denied' error when trying to run a script?What should you check if you get a 'Permission denied' error when trying to run a script?May 07, 2025 am 12:12 AM

Toresolvea"Permissiondenied"errorwhenrunningascript,followthesesteps:1)Checkandadjustthescript'spermissionsusingchmod xmyscript.shtomakeitexecutable.2)Ensurethescriptislocatedinadirectorywhereyouhavewritepermissions,suchasyourhomedirectory.

How are arrays used in image processing with Python?How are arrays used in image processing with Python?May 07, 2025 am 12:04 AM

ArraysarecrucialinPythonimageprocessingastheyenableefficientmanipulationandanalysisofimagedata.1)ImagesareconvertedtoNumPyarrays,withgrayscaleimagesas2Darraysandcolorimagesas3Darrays.2)Arraysallowforvectorizedoperations,enablingfastadjustmentslikebri

For what types of operations are arrays significantly faster than lists?For what types of operations are arrays significantly faster than lists?May 07, 2025 am 12:01 AM

Arraysaresignificantlyfasterthanlistsforoperationsbenefitingfromdirectmemoryaccessandfixed-sizestructures.1)Accessingelements:Arraysprovideconstant-timeaccessduetocontiguousmemorystorage.2)Iteration:Arraysleveragecachelocalityforfasteriteration.3)Mem

Explain the performance differences in element-wise operations between lists and arrays.Explain the performance differences in element-wise operations between lists and arrays.May 06, 2025 am 12:15 AM

Arraysarebetterforelement-wiseoperationsduetofasteraccessandoptimizedimplementations.1)Arrayshavecontiguousmemoryfordirectaccess,enhancingperformance.2)Listsareflexiblebutslowerduetopotentialdynamicresizing.3)Forlargedatasets,arrays,especiallywithlib

How can you perform mathematical operations on entire NumPy arrays efficiently?How can you perform mathematical operations on entire NumPy arrays efficiently?May 06, 2025 am 12:15 AM

Mathematical operations of the entire array in NumPy can be efficiently implemented through vectorized operations. 1) Use simple operators such as addition (arr 2) to perform operations on arrays. 2) NumPy uses the underlying C language library, which improves the computing speed. 3) You can perform complex operations such as multiplication, division, and exponents. 4) Pay attention to broadcast operations to ensure that the array shape is compatible. 5) Using NumPy functions such as np.sum() can significantly improve performance.

How do you insert elements into a Python array?How do you insert elements into a Python array?May 06, 2025 am 12:14 AM

In Python, there are two main methods for inserting elements into a list: 1) Using the insert(index, value) method, you can insert elements at the specified index, but inserting at the beginning of a large list is inefficient; 2) Using the append(value) method, add elements at the end of the list, which is highly efficient. For large lists, it is recommended to use append() or consider using deque or NumPy arrays to optimize performance.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.