Home  >  Article  >  Backend Development  >  Basic method of decompressing file formats in python

Basic method of decompressing file formats in python

爱喝马黛茶的安东尼
爱喝马黛茶的安东尼Original
2019-06-14 14:29:387875browse

Python library that handles multiple compression package formats: patool. If you only use basic decompression, packaging and other operations, and don't want to learn more about the python libraries corresponding to various compression formats, patool should be a good choice.

Related recommendations: "python video"

Basic method of decompressing file formats in python The formats supported by the patool library include:

7z (. 7z, .cb7), ACE (.ace, .cba), ADF (.adf), ALZIP (.alz), APE (.ape), AR (.a), ARC (.arc), ARJ (.arj) , BZIP2 (.bz2), CAB (.cab), COMPRESS (.Z), CPIO (.cpio), DEB (.deb), DMS (.dms), FLAC (.flac), GZIP (.gz), ISO (.iso), LRZIP (.lrz), LZH (.lha, .lzh), LZIP (.lz), LZMA (.lzma), LZOP (.lzo), RPM (.rpm), RAR (.rar, . cbr), RZIP (.rz), SHN (.shn), TAR (.tar, .cbt), XZ (.xz), ZIP (.zip, .jar, .cbz) and ZOO (.zoo)

Basic usage of patool:

import patoolib
# 解压缩
patoolib.extract_archive("archive.zip", outdir="/tmp")
# 测试压缩包是否完整
patoolib.test_archive("dist.tar.gz", verbosity=1)
# 列出压缩包内的文件
patoolib.list_archive("package.deb")
# 创建压缩包
patoolib.create_archive("/path/to/myfiles.zip", ("file1.txt", "dir/"))
# 比较压缩包内文件的差异
patoolib.diff_archives("release1.0.tar.gz", "release2.0.zip")
# 搜索patoolib.search_archive("def urlopen", "python3.3.tar.gz")
# 修改压缩包的压缩格式
patoolib.repack_archive("linux-2.6.33.tar.gz", "linux-2.6.33.tar.bz2")

However, the normal operation of patool depends on other decompression software. For example, when I usually use patool to decompress files, it mainly calls my computer. For the two programs 7z and Rtools, if there is no software on the computer that can process the corresponding compressed files, an error will be reported:

patoolib.util.PatoolError: could not find an executable program to extract format rar; candidates are (rar,unrar,7z)

In addition, patool cannot process password-protected compressed files.
Libraries similar to patool include pyunpack and easy-extract: the pyunpack library relies on zipfile and patool, supports all compression formats supported by the two libraries, and needs to be installed in advance; the easy-extract library relies on the decompression software unrar, 7z, and par2. It needs to be installed in advance and also supports a variety of decompression formats.

Processing of common compression formats

If the corresponding compression software is not installed on the computer and you just want to use python for compression and decompression operations, you can use the other details below Introducing several common

zip formats

Python libraries that can handle zip format include python standard library zipfile, and third-party library python-archive, etc. The following are mainly introduced Let’s take a look at the basic usage of the zipfile library:
First create a ZipFile object:

# 导入ZipFile类
from zipfile import ZipFile
# ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, compresslevel=None)
# 默认模式是读取,该模式提供read(), readline(), readlines(), __iter__(), __next__()等方法

Decompress the file package. There are two decompression functions: extract() and extractall(). The former can decompress a single file and decompresses by default. to the current directory. The latter can decompress multiple files in batches and decompress all files by default. Both extract() and extractall() have parameter pwd and can process compressed packages with passwords.

with ZipFile('test.zip') as myzip:
    myzip.extract(member='1.txt',path='tmp')
    myzip.extractall(path='tmp',members=['1.txt','2.txt'],pwd='password')

Make compressed files: zipfile has four methods for compressing files: zipfile.ZIP_STORED (default), zipfile.ZIP_DEFLATED, zipfile.ZIP_BZIP2, zipfile.ZIP_LZMA

# 添加文件的mode有'w', 'a', 'x'
# 'w'表示覆盖或写入一个新文件;'a'表示在已有文件后追加;'x'表示新建文件并写入。
# 在三种mode下,如果未写入认识数据,则会生成空的ZIP文件。
with ZipFile('test.zip',mode='w') as myzip:    
    for file in ['1.txt', '2.txt']: # 需压缩的文件列表        
        myzip.write(file,compress_type=zipfile.ZIP_DEFLATED)

Compress the entire file Folder

# 方法一
def addToZip(zf, path, zippath):
    if os.path.isfile(path):        
        zf.write(path, zippath, zipfile.ZIP_DEFLATED)  # 以zlib压缩方法写入文件    
    elif os.path.isdir(path):        
        if zippath:            
            zf.write(path, zippath)        
        for nm in os.listdir(path):            
            addToZip(zf, os.path.join(path, nm), os.path.join(zippath, nm))
with zipfile.ZipFile('tmp4.zip', 'w') as zip_file:    
      addToZip(zip_file,'tmp','tmp')    
#方法二
class ZipFolder:    
    def toZip(self, file, zipfilename):        
        # 首先创建zipfile对象        
        with zipfile.ZipFile(zipfilename, 'w') as zip_file:            
            if os.path.isfile(file):  # 判断写入的是文件还是文件夹,是文件的话直接写入                
                zip_file.write(file)            
            else:  # 否则调用写入文件夹的函数assFolderToZip()                
                self.addFolderToZip(zip_file, file)    
    def addFolderToZip(self, zip_file, folder):        
        for file in os.listdir(folder):  # 依次遍历文件夹内的文件            
            full_path = os.path.join(folder, file)            
            if os.path.isfile(full_path): # 判断是文件还是文件夹,是文件的话直接写入                
                print('File added: ', str(full_path))                
                zip_file.write(full_path)            
            elif os.path.isdir(full_path):             
            # 如果是文件夹的话再次调用addFolderToZip函数,写入文件夹                
                print('Entering folder: ', str(full_path))                
                self.addFolderToZip(zip_file, full_path)
directory = 'tmp'   # 需压缩的文件目录
zipfilename = 'tmp1.zip'    #压缩后的文件名
ZipFolder().toZip(directory, zipfilename)

rar format

rar format does not have a corresponding python standard library and needs to rely on third-party libraries rarfile, python-unrar, pyUnRAR2, etc. The above libraries have something in common It depends on the support of RARLAB's UnRAR library. The following mainly introduces the rarfile library:

Installation and configuration
Installation command:

pip install rarfile

But the configuration is quite expensive some time. First you need to download and install UnRAR. Because my computer operating system is Windows, I just go to the RARLAB official website to download UnRarDLL and install it to the default path C:\Program Files (x86)\UnrarDLL.
Then add environment variables. First, add C:\Program Files (x86)\UnrarDLL\x64 (my system is 64-bit) to the Path variable in the system variables (right-click on computer>Properties>Advanced system settings >Advanced>Environment Variables), but the error is still reported after restarting PyCharm:

LookupError: Couldn't find path to unrar library.

Then try to create a new variable in the system variables, enter ?UNRAR_LIB_PATH for the variable name, and the variable value is ?C:\Program Files (x86) \UnrarDLL\x64\UnRAR64.dll (the variable value under 32-bit systems is C:\Program Files (x86)\UnrarDLL\UnRAR.dll). Restart PyCharm and the problem is solved.

Basic usage

The usage of rarfile library is very similar to zipfile, and also includes extract(), extractall(), namelist(), infolist(), getinfo (), open(), read(), printdir() and other functions, the main difference is that the RarFile object only supports reading mode and cannot write files.

# mode的值只能为'r'
class rarfile.RarFile(rarfile, mode='r', charset=None, info_callback=None, crc_check=True, errors='stop')

Using the rarfile library to decompress rar compressed packages is the same as using the zipfile library to decompress zip format compressed packages. Please refer to the usage of the zipfile library.

In addition, the installation, setup and use of the python-unrar library are very similar to the rarfile library, but the python-unrar library does not support the with statement. If you want to use the with statement, you can go to the python-unrar library installation directory. Add the following statement to the rarfile.py file:

def __enter__(self):
    """Open context."""    
    return self
def __exit__(self, typ, value, traceback):    
    """Exit context"""    
    self.close()
def close(self):    
    """Release open resources."""    
    pass

tar format

tar format is a common packaging file format under Unix systems and can be matched with different compression methods. Form different compressed file formats, such as: .tar.gz (.tgz), .tar.bz2 (.tbztb2), .tar.Z (.taz), .tar.lzma (.tlz), .tar.xz ( .txz) etc. The tar format corresponds to the python standard library tarfile. The supported formats include: tar, tar.gz, tar.bz2, tar.xz, .tar.lzma, etc.
Basic usage of the tarfile library:

Create tarfile objects

The tarfile library creates objects using tarfile.open() instead of tarfile.TarFile().

tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)

其中,mode可取的值比较多,主要包括'r', 'w', 'a', 'x'四种模式(在zipfile库的使用中简单介绍过),以及这四种模式与'gz', 'bz2', 'xz'三种压缩方法的组合模式,具体取值如下表所示:

模式                                        含义

'r'or'r:*'                 自动解压并打开文件(推荐模式)    

'r:'                         只打开文件不解压    

'r:gz'                     采用gzip格式解压并打开文件    

'r:bz2'                   采用bz2格式解压并打开文件    

'r:xz'                     采用lzma格式解压并打开文件    

'x'or'x:'                 仅创建打包文件,不压缩    

'x:gz'                    采用gzip方式压缩并打包文件    

'x:bz2'                  采用bzip2方式压缩并打包文件    

'x:xz'                     采用lzma方式压缩并打包文件    

'a'or'a:'                 打开文件,并以不压缩的方式追加内容。如果文件不存在,则新建    

'w'or'w:'                以不压缩的方式写入    

'w:gz'                    以gzip的方式压缩并写入    

'w:bz2'                  以bzip2的方式压缩并写入    

'w:xz'                    以lzma的方式压缩并写入    

但是,不支持'a'与三种压缩方法的组合模式('a:gz', 'a:bz2'、'a:xz')

基本使用方法
解压缩至指定的目录

with tarfile.open("test.tar.gz") as tar:    
    tar.extractall(path='.')

解压符合某些条件的文件

# 解压后缀名为py的文件
def py_files(members):
    for tarinfo in members:        
        if os.path.splitext(tarinfo.name)[1] == ".py":            
            yield tarinfo
with tarfile.open("sample.tar.gz") as tar:    
    tar.extractall(members=py_files(tar))

创建不压缩的打包文件

with tarfile.open("sample.tar", "w") as tar:
    for name in ["foo", "bar", "quux"]:        
        tar.add(name)

创建压缩的打包文件

with tarfile.open("sample.tar", "w:gz") as tar:
    for name in ["foo", "bar", "quux"]:        
        tar.add(name)

压缩并打包整个文件夹,较之zipfile库简单得多,可使用add()函数进行添加

tar = tarfile.open('test.tar','w:gz')
for root ,dir,files in os.walk(os.getcwd()):      
  for file in files:          
      fullpath = os.path.join(root,file)          
      tar.add(fullpath)

其他压缩格式

Python原生的数据压缩打包的标准库还包括:bz2、gzip、zlib、lzma以及建立在zipfile和tarfile库基础上的shutil库,以后有机会再详细介绍。

The above is the detailed content of Basic method of decompressing file formats in python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn