Python file and directory operation and compression modules (os, zipfile, tarfile, shutil)-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Python file and directory operation and compression modules (os, zipfile, tarfile, shutil)

高洛峰

Feb 22, 2017 am 09:15 AM

python

Built-in modules in Python that can be used to operate files and directories include:

Module/function name	Function description
open() function	File reading or writing
os.path module	File path operation
os module	Simple operation of files and directories
zipfile module	File compression
tarfile module	File packaging
shutil module	Advanced file and directory processing
fileinput module	Read all lines in one or more files
tempfile module	Creating temporary files and directories

#File reading or writing has been described in previous articles, please refer to here for details<. here we mainly explain several other modules.>

1. File path operations (os.path module)

The os.path module is mainly used to operate file paths, such as: path segmentation and splicing, and taking relative file paths. and absolute paths, obtain the time attribute of the file corresponding to the file path, determine the type of file corresponding to the file path, determine whether the two paths are the same file, etc.

1. Function list

# 返回指定文件的绝对路径名
os.path.abspath(path)

# 将路径名称分割成两部分(head, tail)，tail是路径名称path中的最后一部分且不包含斜线（路径风格符），head是tail之前的所有部分；如果path以斜线结尾则 tail为空字符串，如果path中没有斜线则head为空字符串
os.path.split(path)

# 将路径名称分割成两部分(root, ext)， ext表示后缀名
os.path.splitext(path)  

# 返回path路径名的基名称，实际上就是os.path.split(path)函数返回值的第二个值
os.path.basename(path)  

# 返回path路径名的目录名称，实际上就是os.path.split(path)函数返回值的第一个值
os.path.dirname(path)  

# 将一个或多个路径中的非空值通过路径分隔符拼接成一个新的路径名称，如果在拼接过程中遇到绝对路径将会丢弃前面的部分并从该绝对路径重新开始拼接
os.path.join(path, *paths)  

# 指定的文件路径存在则返回Ture，否则返回False。如果是失效的链接文件则返回False
os.path.exists(path)  

# 返回该路径对应文件的最近一次访问时间的时间戳（秒），如果文件不存在或无法访问，则引发OSError
os.path.getatime(path)  

# 返回该路径对应文件的最后修改时间的时间戳（秒），如果文件不存在或无法访问，则引发OSError
os.path.getmtime(path)  

# 返回该路径对应文件的ctime，在某些系统上（如Unix上）是最后一次元数据更改时间，在其他系统上（如Windows）是路径的创建时间；如果文件不存在或无法访问，则引发OSError
os.path.getctime(path)  

# 返回指定路径对应文件的字节大小
os.path.getsize(path)  

# 返回path相对于start的相对路径
os.path.relpath(path, start=os.curdir)  

# 获取path的真实、绝对路径（可用于获取软链接文件指向的文件路径）
os.path.realpath(path)  

# 判断path是否是绝对路径，是则返回True，否则返回False
os.path.isabs(path)  

# 判断path是否是一个文件
os.path.isfile(path)  

# 判断path是否是一个目录
os.path.isdir(path) 

# 判断path是否是一个链接
os.path.islink(path)  

# 判断path是否是一个挂载点
os.path.ismount(path)  

# 判断path1和path2是否为同一个文件
os.path.samefile(path1, path2)

Note: The os.path.basename(path) function is the same as the basename program in Unix The difference is that when path ends with a path delimiter (such as '/usr/local/'), basename(path) returns an empty string (''), while the basename program returns the penultimate path. Directory name after the delimiter ('local')

2. Example

>>> import os
>>> 
>>> os.path.abspath(&#39;test.sh&#39;)
&#39;/root/test.sh&#39;

>>> os.path.split(&#39;/root/test.sh&#39;)
(&#39;/root&#39;, &#39;test.sh&#39;)
>>> os.path.split(&#39;/usr/local&#39;)
(&#39;/usr&#39;, &#39;local&#39;)
>>> os.path.split(&#39;/usr/local/&#39;)
(&#39;/usr/local&#39;, &#39;&#39;)
>>> os.path.split(&#39;test.sh&#39;)
(&#39;&#39;, &#39;test.sh&#39;)

>>> os.path.basename(&#39;/root/test.sh&#39;)
&#39;test.sh&#39;
>>> os.path.dirname(&#39;/root/test.sh&#39;)
&#39;/root&#39;

>>> os.path.splitext(&#39;test.sh&#39;)
(&#39;test&#39;, &#39;.sh&#39;)
>>> os.path.splitext(&#39;/root/test.sh&#39;)
(&#39;/root/test&#39;, &#39;.sh&#39;)
>>> os.path.splitext(&#39;/usrl/local&#39;)
(&#39;/usrl/local&#39;, &#39;&#39;)

>>> os.path.join(&#39;/root&#39;)
&#39;/root&#39;
>>> os.path.join(&#39;/root&#39;, &#39;1&#39;, &#39;&#39;, &#39;2&#39;, &#39; &#39;, &#39;3&#39; )
&#39;/root/1/2/ /3&#39;
>>> os.path.join(&#39;/root&#39;, &#39;/usr/local&#39;, &#39;test.sh&#39;)
&#39;/usr/local/test.sh&#39;
>>> os.path.join(&#39;/root&#39;, &#39;/usr/local&#39;, &#39;1&#39;, &#39;&#39;)
&#39;/usr/local/1/&#39;

>>> os.path.exists(&#39;/root/test.sh&#39;)
True
>>> os.path.exists(&#39;/root/test.txt&#39;)
False
>>> os.path.exists(&#39;/etc/rc0.d&#39;)
True

>>> os.path.getatime(&#39;/etc/my.cnf&#39;)
1483433424.62325
>>> os.path.getmtime(&#39;/etc/my.cnf&#39;)
1472825145.4308648
>>> os.path.getctime(&#39;/etc/my.cnf&#39;)
1472825145.432865

>>> os.path.relpath(&#39;/etc/my.cnf&#39;)
&#39;../etc/my.cnf&#39;
>>> os.path.relpath(&#39;/etc/my.cnf&#39;, start=&#39;/etc&#39;)
&#39;my.cnf

>>> os.path.realpath(&#39;/etc/rc0.d&#39;)
&#39;/etc/rc.d/rc0.d&#39;
>>> os.path.realpath(&#39;test.sh&#39;)
&#39;/root/test.sh&#39;

>>> os.system(&#39;ls -l /etc/my.cnf&#39;)
-rw-r--r-- 1 root root 597 Sep  2 22:05 /etc/my.cnf
>>> os.path.getsize(&#39;/etc/my.cnf&#39;)
597

>>> os.path.isabs(&#39;/etc/my.cnf&#39;)
True
>>> os.path.isabs(&#39;my.cnf&#39;)
False
>>> os.path.isfile(&#39;/etc/my.cnf&#39;)
True
>>> os.path.isdir(&#39;/etc/my.cnf&#39;)
False
>>> os.path.islink(&#39;/etc/my.cnf&#39;)
False
>>> os.path.islink(&#39;/etc/rc0.d&#39;)
True
>>> os.path.islink(&#39;/etc/rc0.d/&#39;)
False
>>> os.path.isdir(&#39;/etc/rc0.d/&#39;)
True
>>> os.path.isdir(&#39;/etc/rc0.d&#39;)
True

>>> os.system(&#39;df -Th&#39;)
Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/vda1      ext4       40G  8.7G   29G  24% /
devtmpfs       devtmpfs  3.9G     0  3.9G   0% /dev
tmpfs          tmpfs     3.9G     0  3.9G   0% /dev/shm
tmpfs          tmpfs     3.9G  401M  3.5G  11% /run
tmpfs          tmpfs     3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs          tmpfs     783M     0  783M   0% /run/user/0
0
>>> os.path.ismount(&#39;/&#39;)
True
>>> os.path.ismount(&#39;/dev&#39;)
True
>>> os.path.ismount(&#39;/usr&#39;)
False

>>> os.path.samefile(&#39;/etc/rc0.d&#39;, &#39;/etc/rc0.d&#39;)
True
>>> os.path.samefile(&#39;/etc/rc0.d&#39;, &#39;/etc/rc0.d/&#39;)
True

2. File and directory operations (os module)

It should be noted that: The os module is a mixed operating system interface module. It provides various operating system-related functions. File and directory operations are only part of it. Not all.

On some Unix platforms, many file or directory operation functions of this module support one or more of the following features:

Specify a file descriptor For some functions, the path parameter can not only be a string, but also a file descriptor. This function operates on the file referenced by this file descriptor. We can use os.supports_fd to check whether the current platform path parameter can be specified as a file descriptor. If it is not available, a NotImplementedError will be raised. If the function also supports the dir_fd or follow_symlinks parameter, it is an error to specify the dir_fd or follow_symlinks parameter when the path is provided as a file descriptor.
Path relative to directory descriptor If dir_fd is not None, it should be a file descriptor pointing to a directory, and The path to be operated on should be a relative path to the directory; if path is an absolute path, dir_fd will be ignored.
Do not follow symbolic links If follow_symlinks is False and the last element in the path to be operated on is a symbolic link, this function The linked file will be operated instead of the file pointed to by the linked file.

# 测试当前用户是否对path所对应文件有某种访问权限
# Python2
os.access(path, mode)
# Python3
os.access(path, mode, *, dir_fd=None, effective_ids=False, follow_symlinks=True)

# 更改当前工作目录，从Python3.3开始path参数允许是一个目录的文件描述符
os.chdir(path)

# 更改当前工作目录，从Python3.3开始该函数等价于os.chdir(fd)
os.chfdir(fd)

# 更改文件或目录权限，dir_fd和follow_symlinks是Python3.3新增的参数
os.chmod(path, mode, *, dir_fd=None, follow_symlinks=True)

# 更改文件或目录权限，如果path是个链接文件则影响是链接文件本身；Python3.3开始该函数等价于os.chmod(path, mode, follow_symlinks=False)
os.lchmod(path, mode)

# 更改文件或目录的属主和属组，如果不改变则设置为-1；dir_fd和follow_symlinks是Python3.3新增的参数
os.chown(path, uid, gid, *， dir_fd=None, follow_symlinks=True)

更改文件或目录的属主和属组，如果不改变则设置为-1；如果path是个链接文件则影响是链接文件本身；Python3.3开始该函数等价于os.chown(path, uid, gid, follow_symlinks=False)
os.lchown(path, uid, gid)

# 更改当前进程主目录
os.chroot(path)

# 返回一个表示当前目录的字符串
os.getcwd()

# 返回一个表示当前目录的字节串，Python3新添加的函数
os.getcwdb()

# 创建硬链接, *后面的参数都是Python3.3新增的
os.link(src, dst, *, src_dir_fd=None, dst_dir_fd=None, follow_symlinks=True)

# 创建软链接，*后面的参数都是Python3.3新增的
os.symlink(src, dst, target_is_directory=False, * dir_fd=None)

# 返回指定目录中所有文件列表，顺序不固定，且不包含‘.’和‘..’；注意path在Python2中没有默认值
os.listdir(path=&#39;.&#39;)

# 返回指定目录中所有文件条目对应的DirEntry对象迭代器，顺序不固定，则不包含&#39;.&#39;和‘..’；Python3.5新增的函数
os.scandir(path=&#39;.&#39;)

# 获取文件或文件描述的状态信息，染回一个stat_result对象，dir_fd和follow_symlinks都是Python3.3新增的参数
os.stat(path, *, dir_fd=None, follow_symlinks=True)

# 获取文件或文件描述的状态信息，如果path是一个链接文件则获取的是链接文件本身的状态信息；Python3.3开始，该函数等价于os.stat(path, dir_fd=dir_fd, folow_symlinks=False)
os.lstat(path, *, dir_fd=None)

# 创建一个名为path的目录并指定目录权限，如果目录已经存在则会引起FileExistsError；dir_fd是Python3.3开始新加的参数。需要说明的是该函数与os.makedirs()、os.mkfifo()函数创建的目或逛到文件的权限会受到umask的影响，比如指定mode为0777，实际目录权限为 0777 - umask = 0755
os.mkdir(path, mode=0o777, *, dir_fd=None)

# 递归创建目录，该函数功能与mkdir()相似，但是会递归创建所有的中间目录；exist_ok为Python3.2新增参数，表示当目录已经存在时是否正常返回，如果exist_ok为False（默认）且目标目录已经存在则会引发OSError
os.makedirs(name, mode=0o777, exists_ok=False)

# 创建一个FIFO(命名管道)文件，FIFO可以被当做正常文件那样访问；通常FIFOs被用作‘client’和‘server’类型进程的汇集点，server打开FIFO读取数据，client打开FIFO写入数据。
os.mkfifo(path, mode=0o666, *, dir_fd=None)

# 删除指定的文件，如果path是个目录将会引发OSError
os.remove(path, *, dir_fd=None)
os.unlink(path, *, dir_fd=None)

# 删除指定的空目录，如果目录不为空会引发OSError
os.rmdir(path, *, dir_fd=None)

# 递归删除指定路径中的所有空目录
os.removedirs(name)

# 目录或文件重命名，如果dst是一个目录见鬼引发OSError。在Unix平台上，如果dst存在且是一个文件，那么只要用户有权限就将会被静默替换；而在Windows平台上，如果dst存在，即使它是一个文件也会引发OSError
os.rename(src, dst, *, src_dir_fd=-None, dst_dir_fd=None)

# 目录或文件递归重命名
os.renames(old, new)

# 与os.rename()功能相同，区别在于：对于os.replace()来说，如果dst存在且是一个文件，那么只要用户有权限就将会被静默替换，而没有平台上的差别
os.replace(src, dst, *, src_dir_fd=None, dst_dir_fd=None)

# 返回链接文件指向的真实路径，类似于os.path.relpath(path)，但是该方法可能返回相对路径
os.readlink(path, *, dir_fd=None)

# 返回一个文件的某个系统配置信息，name表示配置项名称，可以通过os.pathconf_names来查看可用的值
os.pathconf(path, name)

Instructions on the os.access() function: By default, the user's Real uid (RUID) and gid are used to detect file access permissions, but most operations will use effective uid (EUID) or gid to detect, and in Python 3, you can use the effective uid/ by setting the effective_ids parameter to True. gid is used for permission detection (for the concept of RUID/EUID/SUID, please refer to
). The possible values of mode are: one of os.F_OK (the file exists), os.R_OK (readable), os.W_OK (writable), os.X_OK (executable), or multiple ones connected by logical operators. .

2. Example

>>> import os
>>> 
>>> os.access(&#39;/bin/passwd&#39;, os.F_OK)
True
>>> os.access(&#39;/bin/passwd&#39;, os.F_OK|os.X_OK)
True
>>> os.access(&#39;/bin/passwd&#39;, os.F_OK|os.W_OK)
True

>>> os.getcwd()
&#39;/root&#39;
>>> os.chdir(&#39;/tmp&#39;)
>>> os.getcwd()
&#39;/tmp&#39;

>>> os.system(&#39;ls -l test*&#39;)
-rw-r--r-- 1 root root 0 Feb  9 09:02 test1.txt
lrwxrwxrwx 1 root root 9 Feb  9 09:02 test.txt -> test1.txt
0
>>> os.chmod(&#39;/tmp/test.txt&#39;, 0666)
>>> os.system(&#39;ls -l test*&#39;)
-rw-rw-rw- 1 root root 0 Feb  9 09:02 test1.txt
lrwxrwxrwx 1 root root 9 Feb  9 09:02 test.txt -> test1.txt
0

>>> os.link(&#39;test.txt&#39;, &#39;test&#39;)
>>> os.system(&#39;ls -li test*&#39;)
271425 lrwxrwxrwx 2 root  root  9 Feb  9 09:02 test -> test1.txt
271379 -rw-rw-rw- 1 mysql mysql 0 Feb  9 09:02 test1.txt
271425 lrwxrwxrwx 2 root  root  9 Feb  9 09:02 test.txt -> test1.txt
0

>>> os.listdir(&#39;.&#39;)
[&#39;zabbix_proxy.log&#39;, &#39;test.txt&#39;, &#39;zabbix_agentd.log&#39;, &#39;.Test-unix&#39;, &#39;systemd-private-14bb029ad4f340d5ac49a6fb3c2ca6c9-systemd-machined.service-gJk0Cd&#39;, &#39;hsperfdata_root&#39;, &#39;wrapper-31124-1-out&#39;, &#39;a&#39;, &#39;test1.txt&#39;, &#39;zabbix_proxy.log.old&#39;, &#39;zabbix_agentd.log.old&#39;, &#39;systemd-private-14bb029ad4f340d5ac49a6fb3c2ca6c9-mariadb.service-kudcMu&#39;, &#39;test&#39;, &#39;.X11-unix&#39;, &#39;.font-unix&#39;, &#39;wrapper-31124-1-in&#39;, &#39;.XIM-unix&#39;, &#39;.ICE-unix&#39;, &#39;Aegis-<Guid(5A2C30A2-A87D-490A-9281-6765EDAD7CBA)>&#39;]
>>> os.listdir(&#39;/tmp&#39;)
[&#39;zabbix_proxy.log&#39;, &#39;test.txt&#39;, &#39;zabbix_agentd.log&#39;, &#39;.Test-unix&#39;, &#39;systemd-private-14bb029ad4f340d5ac49a6fb3c2ca6c9-systemd-machined.service-gJk0Cd&#39;, &#39;hsperfdata_root&#39;, &#39;wrapper-31124-1-out&#39;, &#39;a&#39;, &#39;test1.txt&#39;, &#39;zabbix_proxy.log.old&#39;, &#39;zabbix_agentd.log.old&#39;, &#39;systemd-private-14bb029ad4f340d5ac49a6fb3c2ca6c9-mariadb.service-kudcMu&#39;, &#39;test&#39;, &#39;.X11-unix&#39;, &#39;.font-unix&#39;, &#39;wrapper-31124-1-in&#39;, &#39;.XIM-unix&#39;, &#39;.ICE-unix&#39;, &#39;Aegis-<Guid(5A2C30A2-A87D-490A-9281-6765EDAD7CBA)>&#39;]

>>> os.mkdir(&#39;/tmp/testdir&#39;)
>>> os.system(&#39;ls -l /tmp&#39;)
lrwxrwxrwx 2 root   root         9 Feb  9 09:02 test -> test1.txt
-rw-rw-rw- 1 mysql  mysql        0 Feb  9 09:02 test1.txt
drwxr-xr-x 2 root   root      4096 Feb  9 09:47 testdir
lrwxrwxrwx 2 root   root         9 Feb  9 09:02 test.txt -> test1.txt
>>> os.mkdir(&#39;/tmp/testdir&#39;)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 17] File exists: &#39;/tmp/testdir&#39;
>>> os.mkdir(&#39;/tmp/a/b/c&#39;)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 2] No such file or directory: &#39;/tmp/a/b/c&#39;
>>> os.makedirs(&#39;/tmp/a/b/c&#39;)  # mode默认为0777，结果却是0755，bug？
>>> os.makedirs(&#39;/tmp/b/c/d&#39;, 0700)
>>> os.system(&#39;ls -l /tmp&#39;)
total 2316
drwxr-xr-x 3 root   root      4096 Feb  9 10:16 a
drwx------ 3 root   root      4096 Feb  9 10:16 b
lrwxrwxrwx 2 root   root         9 Feb  9 09:02 test -> test1.txt
-rw-rw-rw- 1 mysql  mysql        0 Feb  9 09:02 test1.txt
drwxr-xr-x 2 root   root      4096 Feb  9 09:47 testdir
lrwxrwxrwx 2 root   root         9 Feb  9 09:02 test.txt -> test1.txt

>>> os.rename(&#39;/tmp/test1.txt&#39;, &#39;/tmp/test3.txt&#39;)
>>> os.system(&#39;ls -l /tmp&#39;)
lrwxrwxrwx 2 root   root         9 Feb  9 09:02 test -> test1.txt
prw-r--r-- 1 root   root         0 Feb  9 10:21 test1.fifo
-rw-rw-rw- 1 mysql  mysql        0 Feb  9 09:02 test3.txt
drwxr-xr-x 2 root   root      4096 Feb  9 09:47 testdir
prw-r--r-- 1 root   root         0 Feb  9 10:20 test.fifo
lrwxrwxrwx 2 root   root         9 Feb  9 09:02 test.txt -> test1.txt

>>> os.readlink(&#39;/tmp/test.txt&#39;)
&#39;test1.txt&#39;

>>> os.rmdir(&#39;/tmp/testdir&#39;)
>>> os.rmdir(&#39;/tmp/a/b/c&#39;)  # 只删除空目录/tmp/a/b/c
>>> os.removedirs(&#39;/tmp/b/c/d&#39;)  # 先删除空目录/tmp/a/b/c，然后删除空目录/tmp/a/b，最后删除目录/tmp/a，而目录/tmp非空，因此不会被删除

>>> os.unlink(&#39;/tmp/test&#39;)
>>> os.unlink(&#39;/tmp/test.fifo&#39;)
>>> os.unlink(&#39;/tmp/test.txt&#39;)
>>> os.system(&#39;ls -l /tmp&#39;)
>>> os.remove(&#39;/tmp/test3.txt&#39;)
>>> os.remove(&#39;/tmp/test1.fifo&#39;)

3. File compression (zipfile module)

1. Classes included in the zipfile module

As the name suggests, the zipfile module Used for file compression operations, this module contains the following classes:

Class name	Description
zipfile.ZipFile	Used for reading and writing ZIP files
zipfile.PyZipFile	Used to create files containing Python The ZIP archive file of the library
zipfile.ZipInfo	is used to represent a member information in the archive file

Instances of the zipfile.ZipInfo class can be obtained through the getinfo() and infolist() methods of the ZipFile object.

2. Functions and constants in the zipfile module

Function/constant name	Description
zipfile.is_zipfile(filename)	Determine whether filename is a valid ZIP file and return a Boolean value
zipfile.ZIP_STORED	Represents a compressed archive member
zipfile.ZIP_DEFLATED	Represents an ordinary ZIP compression method, which requires the support of the zlib module
zipfile.ZIP_BZIP2	represents the BZIP2 compression method and requires the support of the bz2 module; Python3.3 adds
zipfile.ZIP_LZMA	indicates the LZMA compression method, which requires the support of the lzma module; Python3.3 adds

3. zipfile.ZipFile类

类的构造方法

class zipfile.ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True)

创建一个ZipFile实例，表示打开一个ZIP文件。

参数：

file：可以是一个文件的路径（字符串），也可以是一个file-like对象；
mode：表示文件代开模式，可取值有：r（读）, w（写）, a（添加）, x（创建和写一个唯一的新文件，如果文件已存在会引发FileExistsError）
compression：表示对归档文件进行写操作时使用的ZIP压缩方法，可取值有：ZIP_STORED, ZIP_DEFLATED, ZIP_BZIP2, ZIP_LZMA, 传递其他无法识别的值将会引起RuntimeError；如果取ZIP_DEFLATED, ZIP_BZIP2， ZIP_LZMA，但是相应的模块（zlib, bz2, lzma）不可用，也会引起RuntimeError；
allowZip64：如若zipfile大小超过2GiB且allowZip64的值为False，则将会引起一个异常

说明：

从Python 3.2开始支持使用ZipFile作为上下文管理器（with语法）
从Python 3.3开始支持bzip2和lzma压缩
从Python 3.4开始allowZip64默认值改为True
从Python 3.5开始添加对unseekable streams的写操作支持以及对‘x’ mode的支持

实例方法列表

# 打印该归档文件的内容
printdir()

# 从归档文件中展开一个成员到当前工作目录，memeber必须是一个完整的文件名称或者ZipInfo对象，path可以用来指定一个不同的展开目录，pwd用于加密文件的密码
extract(memeber, path=None, pwd=None)

# 从归档文件展开所有成员到当前工作目录，path和pwd参数作用同上，memebers必须是namelist()返回的list的一个子集
extractall(path=None, members=None, pwd=None)

# 返回一个与每一个归档成员对应的ZipInfo对象的列表
infolist()

# 返回归档成员名称列表
namelist()

# 返回一个包含压缩成员name相关信息的ZipInfo对象，如果name没有被包含在该压缩文档中将会引发KeyError
getinfo(name)

# 将归档文件中的一个成员作为一个file-like对象展开；name是归档文件中的文件名或者一个ZipInfo对象
open(name, mode='r', pwd=None)

# 关闭该压缩文件；退出程序前必须调用close()方法，否则一些必要记录不会被写入
close()

# 设置pwd作为展开加密文件的默认密码
setpassword(pwd)

# 读取归档文件中所有文件并检查它们的完整性，返回第一个被损坏的文件名称，或者None。对已关闭的ZipFile调用testzip()将会引发RuntimeError
testzip()

# 返回归档文件中name所指定的成员文件的字节。name是归档文件中的文件名称或一个ZipInfo对象。该归档文件必须以读(r)或追加(a)的模式打开。如果设置了pwd参数，则其将会覆盖setpassword(pwd)方法设置的默认密码。对一个已经关闭的ZipFile调用read()方法将会引发RuntimeError
read(name, pwd=Noneds)

# 将filename文件写入归档文件，可以通过arcname指定新文件名（需要注意的是文件名中磁盘盘符和开头的路径分隔符都会被移除）；compress_type表示压缩方法，如果指定了该参数则会覆盖ZipFile构造方法中的compression参数指定的值；要调用此方法，归档文件必须以'w', 'a'或'x'模式打开，如果对以'r'模式打开的ZipFile调用write()方法或者对已关闭的ZipFile调用write()方法将会引发RuntimeError
write(filename, arcname=None, compress_type=None)

# 将一个字节串写入归档文件；zinfo_or_arcname可以是归档文件中的文件名称，也可以是一个ZipInfo实例
writestr(zinfo_or_arcname, bytes[, compress_type])

4. zipfile.PyZipFile类

PyZipFile类用于创建包含Python库的ZIP存档

类的构造方法

PyZipFile的构造方法与ZipFile的构造方法参数基本一致，只是多了一个optimize参数

class zipfile.PyZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, optimize=-1)

说明：

Python 3.2 新增optimize参数
Python 3.4 allowZip64默认值改为True

实例方法列表

PyZipFile类的实例方法与ZipFile类的实例方法一致，只是多了一个writepy()方法：

# 搜索*.py文件并将相应的文件添加到归档文件
writepy(pathname, basename='', filterfunc=None)

说明：

如果该PyZipFile实例的构造方法中的optimize参数没有被给出，或者被设置为-1，那么这里所指的“相应文件”是一个*.pyc文件，如果需要，会进行编译。
如果该PyZipFile实例的构造方法中的optimize参数值为0, 1或2，只有那些同样优化等级（参考compile()函数）的文件会被添加到归档文件。
如果pathname是一个文件，文件名必须以.py结尾，且仅仅是这些（*.py[co]）文件被添加，不包含路径信息；如果pathname是一个文件，但是不以.py结尾，将会引发RuntimeError。
如若pathname是一个目录，且这个目录不是一个package目录，则所有的（不包含路径信息）.py[co]文件将被添加；如果pathname是一个package目录，则所有的.py[co]都会作为一个文件路径被添加到这个package名称下，并且如果任何子文件夹是package目录，则会被递归添加。
basename仅供内部使用
filterfunc参数如果被给出，则其必须是一个只接收一个字符串参数的函数。每个文件路径在被添加到归档之前都会作为参数传递给filterfunc所指定的函数。如果filterfunc返回False，则这个路径将不会被添加，如果是一个目录，则它的内容将会被忽略。
filterfunc参数是Python 3.4新加的。

5. zipfile.ZipInfo类

ZipInfo类的实例时通过ZipFile对象的getinfo()和infolist()方法返回的，其本身没有对外提供构造方法和其他方法。每一个ZipInfo对象存储的是ZIP归档文件中一个单独成员的相关信息，因此该实例仅仅提供了以下属性用于获取归档文件中成员的信息。

属性名称	描述
ZipInfo.filename	文件名称
ZipInfo.date_time	文件的最后修改日期和时间，这是一个tuple：(年, 月, 日, 时, 分, 秒)
ZipInfo.compress_type	压缩类型
ZipInfo.comment	文件备注
ZipInfo.extra	扩展字段数据
ZipInfo.create_system	ZIP归档的创建系统
ZipInfo.create_version	创建ZIP归档的PKZIP版本
ZipInfo.extract_version	展开ZIP归档所需要的PKZIP版本
ZipInfo.reserved	必须是0
ZipInfo.flag_bits	ZIP标志位
ZipInfo.volume	文件头的Volume号码
ZipInfo.internal_attr	内部属性
ZipInfo.external_attr	外部属性
ZipInfo.header_offset	文件头的字节偏移量
ZipInfo.CRC	未压缩文件的CRC-32
ZipInfo.compress_size	压缩后的数据大小
ZipInfo.file_size	未压缩文件大小

6. 实例

实例1：文件归档与解压缩操作

import zipfile

# 归档
z = zipfile.ZipFile('test.zip', 'w')
z.write('a.txt')
z.write('b.log')
z.close()

# 解压
z = zipfile.ZipFile('test.zip', 'r')
z.extractall()
z.close()

# 文件信息读取
z = zipfile.ZipFile('test.zip', 'r')
z.printdir()
z.namelist()
z.infolist()
zinfo = z.getinfo('a.txt')
print(zinfo.filename)
print(zinfo.date_time)
print(zinfo.file_size)
print(zinfo.compress_size)
z.close()

实例2：python文件归档

工程目录结构

MYPROG
│  hello.py
│
├─account
│      login.py
│      __init__.py
│
├─test
│      test_print.py
│
└─tools
        tool.py

代码

import zipfile
 
pyz = zipfile.PyZipFile(&#39;myprog.zip&#39;, &#39;w&#39;)
pyz.writepy(&#39;MYPROG/hello.py&#39;)
pyz.writepy(&#39;MYPROG/tools&#39;)
pyz.writepy(&#39;MYPROG/test&#39;)
pyz.writepy(&#39;MYPROG/account&#39;)
pyz.close()

pyz.printdir()

输出结果：

File Name                                             Modified             Size
hello.pyc                                      2017-02-16 11:46:20          130
tool.pyc                                       2017-02-16 11:55:44          135
test_print.pyc                                 2017-02-16 11:55:48          140
account/__init__.pyc                           2017-02-16 11:55:54          118
account/login.pyc                              2017-02-16 11:55:54          138

四、文件打包（tarfile模块）

tarfile模块用于读写tar归档文件，它也可以同时实现压缩功能。与zipfile模块相比，tarfile模块可以直接将一个目录进行归档并压缩。另外，tarfile模块提供的api更“面向对象”化。

1. tarfile模块包含的两个主要的类

类名	描述
TarFile	该类提供了操作一个tar归档的接口
TarInfo	一个TarInfo对象代表TarFile中的一个成员

这两个类的关系类似于zipfile.ZipFile与zipfile.ZipInfo的关系，TarInfo对象中保存了一个文件所需要的所有属性，比如：文件类型、文件大小、修改时间、权限、属主等，但是它不包含文件的数据。

2.tarfile模块包含的方法和常量

方法/常量名	描述
tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)	为指定的路径名name返回一个TarFile对象
tarfile.is_tarfile(name)	如果name是一个tarfile模块可以读的tar归档文件则返回True，否则返回False
tarfile.ENCODING	表示默认字符编码，在windows上为'utf-8'，否则为sys.getfilesystemencoding()的返回值
tarfile.USTAR_FORMAT	POSIX.1-1922(ustar)格式
tarfile.GUN_FORMAT	GUN tar格式
tarfile.PAX_FORMAT	POSIX.1-2001(pax)格式
tarfile.DEFAULT_FORMAT	表示创建归档的默认格式，当前值为GUN_FORMAT

关于open()函数的说明：

tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)

该函数用于创建并返回一个TarFile对象。Python官方文档不建议直接使用TarFile的构造方法构建示例，而是建议使用这个open()函数来操作TarFile对象。下面我们来说说它的参数：

name：表示要创建的归档文件的名称，通常为.tar, .tar.gz, .tar.bz2, .tar.xz，具体后缀应该与mode的值对应
mode：必须是一个filemode[:compression]格式的字符串，默认值为'r'。filemode的可取值为'r', 'w', 'a', 'x'; compression表示压缩方式，可取值为'gz', 'bz2', 'xz'；需要注意的是'a:gz', 'a:bz2', 'a:xz'是不允许的格式。

下面是mode所有可取值的列表：

mode	行为
'r:'	以读模式打开一个未压缩的归档文件（通常后缀为*.tar）
'r:gz'	以读模式打开一个通过gzip方式进行压缩的归档文件（通常后缀为*.tar.gz）
'r:bz2'	以读模式打开一个通过bzip2方式进行压缩的归档文件（通常后缀为*.tar.bz2）
'r:xz'	以读模式打开一个通过lzma方式进行压缩的归档文件（通常后缀为*.tar.xz）
'r' 或 'r:*'	以读模式打开归档文件，可以打开以上任意方式压缩的归档文件，且会自动判断应该使用的压缩方式。推荐使用这个mode。
'w'或'w:'	以写模式打开一个不进行压缩的归档文件
'w:gz'	以写模式打开一个以gzip方式进行压缩的归档文件
'w:bz2'	以写模式打开一个以bzip2方式进行压缩的归档文件
'w:xz'	以写模式打开一个以lzma方式进行压缩的归档文件
'x'或'x:'	同'w'或'w:'，但是如果归档文件已经存在会引发FileExistsError
'x:gz'	同'w:gz'，但是如果归档文件已经存在会引发FileExistsError
'x:bz2'	同'w:bz2''，但是如果归档文件已经存在会引发FileExistsError
'x:xz'	同'w:xz'，但是如果归档文件已经存在会引发FileExistsError
'a'或'a:'	以追加方式打开一个不进行压缩的股低昂文件，如果文件不存在则创建

对于 'w:gz', 'r:gz', 'w:bz2', 'r:bz2', 'x:gz', 'x:bz2'这些模式, tarfile.open() 接收关键字参数 compresslevel (默认值为9) 来指定该归档文件的压缩级别.

3.tarfile.TarFile类

类构的造方法

class tarfile.TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)

参数说明：

下面所有的参数都是可选的，且可以作为TarFile类实例的属性被访问；
name：指定归档文件路径名称；如果fileobj参数被指定该参数可以被忽略，且如果fileobj的name属性存在则取该属性的值；
mode：：指定文档打开模式；r：读取已存在的归档，a：向一个已存在的文件追加数据，w：创建一个新的文件覆盖已经存在的文件，x：如果文件不存在才创建一个新文件
fileobj：指定要读写的文件对象；如果指定了该参数，那么mode参数的值会被fileojb的mode属性值覆盖，且name参数可以被忽略；
format：用于控制归档格式；必须是这些值中的一个：USTAR_FORMAT, GUN_FORMAT, PAX_FORMAT
tarinfo：
dereference：如果该参数值为False，则直接将软连接和硬链接添加到归档中；如果该参数值为True，则将目标文件的内容添加到归档中；
ignore_zeros：该参数值对读取连续或损坏的归档时有效；如果值为False，则会把一个空block当做归档文件的结束位置；如果值为Ture，则会跳过空或无效的block并尝试获取尽可能多的归档成员
debug：设置调试级别，可取值为0（不输出任何调试信息）至 3（输出所有调试信息），调试信息会被写到sys.stderr；
errorlevel：设置错误级别；如果值为0，则使用TarFile.extract()方法时出现的所有错误都会被忽略，否则，如果debug可用，这些信息会作为错误信息出现在debug输出中。如果值为1，则所有fatal错误将会引发OSError；如果值为2，则所有非fatal错误将会引发TarError；
encoding 和 errors：这两个参数定义了读写归档时使用的字符编码和如何处理转换错误

类方法

classmethod TarFile.open(...)

这是个可选的构造方法，实际上tarfile.open()函数就是这个函数的快捷方式

实例方法

# 将name文件添加到归档；name可以是任何类型的文件（如：目录，fifo管道，软连接等），arcname用于指定name文件被添加到归档后的新名字，arcname默认为None，表示文件名称保持不变。recursive值为Trur表示如果name文件是一个目录，则该目录中文件会被递归添加到归档中。exclude参数如果被指定，则其值必须是一个接受文件名作为参数的函数，且该函数必须返回一个布尔值，返回值为True表示该文件将不会被添加到归档中，反之则会被添加到归档中。filter参数如果被提供，则它必须是一个关键字参数且它应该是一个接收TarInfo对象作为参数的函数，该函数应该返回被修改后的TarInfo对象；如果它的返回值为None，那么该TarInfo将不会被添加到归档中。需要说明的是，从Python 3.2开始 exclude参数被废弃，新增filter参数，且使用filter代替exclude的功能
add(name, arcname=None, recursive=True, exclude=None, *, filter=None)

# 添加指定TarInfo对象到归档中。如果fileobj被提供，它应该是一个二进制文件，且会从这个二进制文件中读取tarinfo.size字节的内容添加到这个归档中。你可以通过gettarinfo()直接创建TarInfo对象
addfile(tarinfo, fileobj=None)

# 返回归档成员name对应的TarInfo对象（类似zipfile.ZipFile实例的getinfo(name)方法）；如果name无法在归档中找到会引发KeyError，如果一个成员在归档中不仅出现一次，则最后一次出现将被当做最新版本
getmemeber(name)

# 将归档中所有成员作为TarInfo对象的列表返回（类似zipfile.ZipFile实例的infolist()方法）
getmemebers()

# 将归档中所有成员的名称以列表形式返回（类似zipfile.ZipFile实例的namelist()方法）
getnames()

# 打印内容列表到sys.stdout（类似zipfile.ZipFile实例的printdir()方法）；如果verbose值为False，则仅打印成员的名称；如果verbose值为True，则打印的内容类似&#39;ls -l&#39;命令的输出；如果可选参数members被给出，它必须是getmembers()方法返回的列表的子集；Python 3.5新增memebers参数
list(verbose=True, *, memebers=None)

# （当以读模式打开归档时）该方法以TarInfo对象的形式返回归档的下一个成员，如果已经没有可用的成员则返回None
next()

# 将归档中的所有成员提取到当前工作目录或path参数指定的目录；如果memebers参数被指定，它必须是getmemebers()函数返回列表的子集；所有者、更改时间和权限等目录信息会在所有成员被提取后设置；如果numberic_owner值为True，将使用tarfile的uid和gid数字来设置提取后文件的属主和属组，否则将使用叔叔和属组的名字。Python 3.5中新增了number_owner参数
extractall(path=".", memebers=None, *, numeric_owner=False)

# 提取归档中的一个成员到当前工作目录或path指定的目录，member参数的值可以是一个文件名或一个TarInfo对象；Python 3.2添加了set_attrs参数，Python 3.5添加了numberic_owner参数
extract(member, path="", set_attrs=True, *, numberic_owner=False)

# 提取归档中的一个成员为一个文件对象，member参数的值可以是一个文件名或一个TarInfo对象；从Python 3.3开始，如果member是一个普通文件或是一个链接，该方法会返回一个io.BufferedReader对象，否则会返回None
extractfile(member)

# 通过对现有文件执行os.stat()操作的结果创建一个TarInfo对象；这个已存在的文件可以通过文件名name来指定，也而已通过文件对象fileobj来指定（文件描述符），文件被添加到归档后的文件名取值优先级为：arcname参数的值，fileobj.name属性的值，name参数的值；你可以在通过addfile()方法将该文件添加到归档之前对TarInfo对象的一些属性值进行修改
gettarinfo(name=None, arcname=None, fileobj=None)

# 关闭TarFile对象
close()

4. tarfile.TarInfo类

一个TarInfo对象表示TarFile中的一个成员。TarInfo对象中除了保存了一个文件所需要的所有属性（比如：文件类型、文件大小、修改时间、权限、属主等）之外，它还提供了一些用于判断其文件类型的方法。需要注意的是，它不包含文件的数据。TarInfo对象可以通过TarFile的getmember()、getmembers()和gettarinfo()方法获取。

类构造方法

class tarfile.TarInfo(name="")

类方法

# 从字符串缓冲区创建一个TarInfo对象并返回
classmethod TarInfo.frombuf(buf, encoding, errors)

从TarFile对象中读取一个成员并将其作为一个TarInfo对象返回
classmethod TarInfo.fromtarfile(tarfile)

对象方法和属性

方法/属性名	描述
name	归档成员名称
size	字节大小
mtime	最后更改时间
mode	权限位
type	文件类型，通常是以下几个常量中的一个：REGTYPE, AREGTYPE, LINKTYPE, SYMTYPE, DIRTYPE, FIFOTYPE, CONTTYPE, CHRTYPE, BLKTYPE, GUNTYPE_SPARSE。判断一个TarInfo对象类型的更方便的方式是使用下面的is*()方法
linkname	目标文件名称，这只是在TarInfo对象的类型是LINKTYPE和SYMTYPE时有效
uid	最初存储该成员的用户ID
gid	最初存储该成员的组ID
uname	用户名
gname	组名
pax_headers	一个包含pax扩展头的key-value字典
isfile() / isreg()	判断TarInfo对象是否是一个普通文件
isdir()	判断TarInfo对象是否是一个目录
issym()	判断TarInfo对象是否是一个软链接
islnk()	判断TarInfo对象是否是一个硬链接
ischr()	判断TarInfo对象是否是一个字符设备
isblk()	判断TarInfo对象是否是一个块设备
isfifo()	判断TarInfo对象是否是一个FIFO管道
isdev()	判断TarInfo对象是否是一个字符设备或块设备或 FIFO管道
tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')	从一个TarInfo对象生成一个字符串缓冲区

5. 实例

工程目录结构：

MYPROG
│  hello.py
│
├─account
│      login.py
│      __init__.py
│
├─test
│      test_print.py
│
└─tools
        tool.py

Python工程归档及解压操作：

import tarfile

# 归档压缩
tf = tarfile.open(&#39;myprog.tar.gz&#39;, &#39;w:gz&#39;)
tf.add("MYPROG")
tf.close()

# 解压
tf = tarfile.open(&#39;myprog.tar.gz&#39;)
tf.extractall()
tf.close()

# 读取归档文件内容
tf = tarfile.open(&#39;myprog.tar.gz&#39;)
tf.list()
print(tf.getmembers())
f = tf.getmember(&#39;MYPROG/hello.py&#39;)
print(f.name)
print(f.size)
f.isfile()
tf.close()

五、高级文件和目录处理(shutil模块)

上面我们介绍了路径操作（os.path模块）、文件和目录操作（os模块）和文件归档压缩操作（zipfile模块和tarfile模块），但是还是这些模块要么缺少一些常用的功能（如：文件复制、删除非空文件夹），要么使用起来不是那么方便，而shutil模块shutil提供了一些文件和文件集合的高级操作，可以弥补这些不足。

需要注意的是：虽然shutil.copy()和shutil.copy2()是高级复制函数，但是它们并不能拷贝所有的文件元数据(metadata)，例如在POSIX平台上，文件的属主、属组和ACLs等信息都会丢失。

1. 文件和目录操作

# 文件内容（部分或全部）复制，参数是两个已经打开的文件对象；length是一个整数，用于指定缓冲区大小，如果其值是-1表示一次性复制，这可能会引起内存问题
shutil.copyfileobj(fsrc, fdst[, length])

# 文件内容全部复制（不包括metadata状态信息）， 参数是两个文件名，且dst必须是完整的目标文件名称；如果dst已经存在则会被替换；follow_symlinks是Python 3.3新增的参数，且如果它的值为False则将会创建一个新的软链接文件
shutil.copyfile(src, dst, *, follow_symlinks=True)

# 仅拷贝文件权限（mode bits），文件内容、属组、属组均不变，参数是两个文件名；follow_symlinks是Python 3.3新增的参数
shutil.copymode(src, dst, *, follow_symlinks=True)

# 仅拷贝文件状态信息（包括文件权限，但不包含属主和属组）：mode bits, atime, mtime, flags，参数是两个文件名；follow_symlinks是Python 3.3新增的参数
shutil.copystat(src, dst, *, follow_symlinks=True)

# 拷贝文件内容和权限，并返回新创建的文件路径；相当于copyfile + copymode，参数是两个路径字符串，且dst可以是一个目录的路径；follow_symlinks是Python 3.3新增的参数
shutil.copy(src, dst, *, follow_symlinks=True)

# 与copy函数功能一致，只是会把所有的文件元数据都复制；相当于copyfile + copystat，也相当于 &#39;cp -p&#39;（不包括属主和属组）；follow_symlinks是Python 3.3新增的参数
shutil.copy2(src, dst, *, follow_symlinks=True)

# 这个工厂方法接收一个或多个通配符字符串，然后创建一个可以被传递给copytree()方法的&#39;ignore&#39;参数的函数。文件名与指定的通配符匹配时，则不会被赋值。
shutil.ignore_patterns(*patterns)

# （递归）拷贝整个src目录到目标目录dst，且目标目录dst必须是不存在的。该函数相当于 &#39;cp -pr&#39;；目录的权限和时间通过shutilcopystat()来拷贝，单个文件通过shutil.copy2()来考虑诶
shutil.copytree(src, dst, symlinks=False, ignore=None, copy_function=copy2, ignore_dangling_symlinks=False)

# 递归删除，相当于 rm -r
shutil.rmtree(path, ignore_errors=False, onerror=None)

# 递归移动并返回目标路径，相当于move命令；如果dst是一个已经存在的目录，则src将会被移动到该目录里面；如果dst已经存在，但不是目录，他将会被覆盖；copy_function是Python 3.5新加的关键字参数
shutil.move(src, dst, copy_function=copy2)

# 以一个命名tuple的形式返回磁盘使用信息(total, used, free)，单位为字节；Python 3.3新增方法
shutil.disk_usage(path)

# 更改指定路径的属主和属组，user可以是一个系统用户名或一个uid，group也是这样；这两个参数至少要提供一个；Python 3.3新增方法
shutil.chown(path, user=None, group=None)

# 返回命名cmd的文件路径，相当于which命令；Python 3.3新增方法
shutil.which(cmd, mode=os.F_OK|os.X_OK, path=None)

2. 归档操作

shutil模块的当当操作是创建和读取压缩文件的高级工具，同时提供文档归档功能。这些高级工具的实现是基于zipfile和tarfile模块实现的，其中与make_archive相关的函数是在Python 2.7版本新增的，而与unpack_archive相关的函数是在Python3.2版本新增的。

# 创建一个归档文件（zip或tar）并返回它的名字；
# basename是要创建的文件名称，包含路径，但是不包含特定格式的扩展名；
# format是归档格式，可取值为&#39;zip&#39;, &#39;tar&#39;, &#39;gztar&#39;, &#39;bztar&#39;和 ‘xztar’ 
# root_dir表示归档文件的根目录，即在创建归档之前先切换到它指定的目录，
# base_dir表示要进行归档的目录，如果没有提供则对root_dir目录下的所有文件进行归档压缩（它可以是绝对路径，也可以是相对于root_dir的相对路径，它将是归档中所有文件和目录的公共前缀）
# root_dir和base_dir默认都是当前目录
# dry_run如果值为Ture表示不会创建归档，但是操作输出会被记录到logger中，可用于测试
# loggger必须是一个兼容PEP 282的对象，通常是logging.Logger的一个实例
# verbose该参数没有用处且已经被废弃
# Python 3.5新增对xztar格式的支持
shutil.make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
                    dry_run=0, owner=None, group=None, logger=None)

# 解压归档文件
# filename是归档文件的全路径
# extract_dir时解压归档的目标目录名称，如果没有提供，则取当前工作目录
# format是归档格式：&#39;zip&#39;, &#39;tar&#39; 或 &#39;gztar&#39;中的一种。或者是通过register_unpack_format()注册时的其他格式，如果未提供则会根据归档文件的扩展名去查找相应的解压器，如果没找到则会引发ValueError。
shutil.unpack_archive(filename[, extract_dir[, format]])

# 返回支持的归档格式列表，且该列表中的每个元素是一个元组(name, description)
# shutil默认提供以下归档格式：
# gztar: gzip&#39;ed tar-file
# bztar: bzip2&#39;ed tar-file（如果bz2模块可用）
# xztar: xz&#39;ed tar-file（如果lzma模块可用）
# tar: uncompressed tar file
# zip: ZIP file
# 我们可以通过register_archive_format()来注册新的归档格式或者为已存在的格式提供我们自己的归档器
shutil.get_archive_formats()

# 以列表形式返回所有已注册的解压格式，每个列表中的每个元素都是一个元组(name, extensions, description)
# shutil默认提供的解压格式与shutil.get_archive_formats()返回结果一直
# 我们可以通过register_unpack_format()来注册新的格式或为已存在的格式提供我们自己的解压器
shutil.get_unpack_formats()

# 注册一个新的归档格式对应的归档器
shutil.register_archive_fromat(name, function[, extra_args[, description]])

# 从支持的归档格式列表中移除指定的归档格式
shutil.unregister_archive_fromat(name)

# 注册一个新的解压格式对应的解压器
shutil.register_unpack_format(name, extensions, function[, extra_args[, description]])

# 从支持的解压格式列表中移除指定的挤压格式
shutil.unregister_unpack_format(name)

3. 实例

shutil.copyfile(&#39;/tmp/myprog/hello.py&#39;, &#39;/tmp/hello.py&#39;)
# 仅复制文件内容
# -rw-r--r-- 1 root root 46 Feb 21 16:22 /tmp/hello.py

shutil.copymode(&#39;/tmp/myprog/hello.py&#39;, &#39;/tmp/hello.py&#39;)
# 仅复制文件权限位
# -rwxr-xr-x 1 root root 46 Feb 21 16:46 /tmp/hello.py

shutil.copystat(&#39;/tmp/myprog/hello.py&#39;, &#39;/tmp/hello.py&#39;)
# 仅复制文件元数据(atime, mtime)
# -rwxr-xr-x 1 root root 46 Feb 18 17:32 /tmp/hello.py

shutil.copy(&#39;/tmp/myprog/hello.py&#39;, &#39;/tmp/hello1.py&#39;)
# 复制文件内容和权限位
# -rwxr-xr-x 1 root root 46 Feb 21 16:54 /tmp/hello1.py

shutil.copy2(&#39;/tmp/myprog/hello.py&#39;, &#39;/tmp/hello2.py&#39;)
# 同时赋值文件内容、权限为和时间
# -rwxr-xr-x 1 root root 46 Feb 18 17:32 /tmp/hello2.py

shutil.copytree(&#39;/tmp/myprog&#39;, &#39;/tmp/myprog1&#39;)
# 复制一个目录（包括子目录和文件）

shutil.move(&#39;/tmp/myprog1&#39;, &#39;/tmp/myprog2&#39;)
# 移动文件或目录，也可以看做是“重命名”

shutil.rmtree(&#39;/tmp/myprog2&#39;)
# 删除一个目录（包括子目录和文件）

shutil.make_archive(&#39;/data/myprog&#39;, &#39;gztar&#39;, root_dir=&#39;/tmp/&#39;, base_dir=&#39;myprog&#39;)
# 切换到/tmp目录下，将myprog目录以gzip的格式进行归档压缩，压缩文件路径为/data/myprog.tar.gz

六、其他相关模块（tempfile和fileinput模块）

tempfile模块用于创建和操作临时文件；fileinput模块用于同时读取多个文件的内容（包括sys.stdin）。这两个模块比较简单，大家自己翻下官方文档就可以了。

七、总结

使用os.path模块进行路径相关操作，如：路径分割，路径拼接，获取路径对应文件的大小、绝对路径、3个时间属性、目录名（dirname）和文件名（basename），判断路径对应文件的类型等；
使用os模块进行文件及目录相关基础操作，如：删除单个文件或空目录，设置文件或目录的权限和属主、属组，文件和目录的移动、重命名，创建目录、层级目录、FIFO管道文件、硬链接和软链接（创建普通文件使用open()函数），判断有无对文件或目录有指定的权限，查询指定目录下的所有文件列表等；
使用shutil模块进行高级文件和目录操作，如：文件（内容、权限位、时间属性、全部）复制，目录递归复制、非空目录递归删除；
使用zipfile或tarfile模块进行文件的归档压缩操作，shutil模块提供的解压函数是Python 3.2版本才提供的，因此对于使用Python 2进行开发的项目是无法使用shutil模块提供的所有功能函数的；当然运维的同学有时也会直接执行tar命令是实现压缩和解压缩，但是跨平台性就无法保证了。

更多Python之文件与目录操作及压缩模块（os、zipfile、tarfile、shutil）相关文章请关注PHP中文网！

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Python: Games, GUIs, and MoreApr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python vs. C : Applications and Use Cases ComparedApr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

The 2-Hour Python Plan: A Realistic ApproachApr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python: Exploring Its Primary ApplicationsApr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

How Much Python Can You Learn in 2 Hours?Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

How to teach computer novice programming basics in project and problem-driven methods within 10 hours?Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

What should I do if the '__builtin__' module is not found when loading the Pickle file in Python 3.6?Apr 02, 2025 am 07:12 AM

Error loading Pickle file in Python 3.6 environment: ModuleNotFoundError:Nomodulenamed...

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software