How to use Python file processing methods, os module and glob module-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

How to use Python file processing methods, os module and glob module

PHPz

May 13, 2023 am 10:19 AM

pythonos module

1. Basic file operations

1. open() Open the file

open() method Used to open a file and return a File object. This function needs to be used during file processing. If the file cannot be opened, an OSError will be thrown.

Note: When using the open() method, you must ensure that the file object is closed, that is, the close() method is called.

The common form of the open() function is to receive two parameters: file name (file) and mode (mode).

open(file, mode=&#39;r&#39;)

The complete syntax format is:

open(file, mode=&#39;r&#39;, buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

2. read() reads the file

f = open(r&#39;/Users/mac/desktop/jupyter/pythonCourseware/32.txt&#39;, mode=&#39;r&#39;)# read模式打开文件
data = f.read() # 读取文件内容，向操作系统发起读请求，会被操作系统转成具体的硬盘操作，将内容由硬盘读入内存
print(data)
# 由于Python的垃圾回收机制只回收引用计数为0的变量，但是打开文件还占用操作系统的资源，所以我们需要回收操作系统的资源资源
# del f 只是回收变量f
f.close()

3. write() writes the file:

# write模式打开文件
f = open(r&#39;/Users/mac/desktop/jupyter/pythonCourseware/32.txt&#39;, mode=&#39;w&#39;)
f.write("""name = &#39;nick&#39;
pwd = &#39;123&#39;""")
f.close()

4 , with open() method

The with open() method not only provides a method to automatically release the resources occupied by the operating system, but also with open can be separated by commas to open multiple files at one time to achieve fast copying of files.

with open(&#39;32.txt&#39;, &#39;rt&#39;, encoding=&#39;utf8&#39;) as f:
    print(f.read())

with open(&#39;32.txt&#39;, &#39;rb&#39;) as fr, \
        open(&#39;35r.txt&#39;, &#39;wb&#39;) as fw:
    f.write(f.read())

2. File opening mode

There are four basic modes of file operation

r mode: (default) read-only mode, only It can be read but not written. The file pointer will be placed at the beginning of the file. If the file does not exist, an error will be reported.
w mode: Overwrite mode. If the file does not exist, it will be created and edited from the beginning. That is, the original content will be deleted, that is, it will be completely overwritten.
a mode: Append writing mode: If the file does not exist, it will be created. If it exists, the content will be appended to the end of the file. That is to say, the new content will be written after the existing content.

There are two formats for file reading and writing content

t mode is text (default): text mode
b mode is bytes: byte mode.

It should be noted that the two modes t and b cannot be used alone, and they need to be used in conjunction with one of r/w/a.

1. File r opening mode

1. Reading text

# rt: read by text
# windows的操作系统默认编码为gbk，因此需要使用utf8编码
f = open(&#39;32.txt&#39;, mode=&#39;rt&#39;, encoding=&#39;utf8&#39;)
data = f.read()
print(data) # nick最帅吗
print(type(data)} # <class &#39;str&#39;>
f.close()

2. Reading bytes

# rb: read by bytes
f = open(&#39;32.txt&#39;, mode=&#39;rb&#39;)
data = f.read()
print(data) # b&#39;aaa\nbbb\nccc\nnick\xe6\x9c\x80\xe5\xb8\x85\xe5\x90\x97&#39;
print(type(data)) # <class &#39;bytes&#39;>
f.close()

3. for reading line by line, quite on readline.

fname = input("请输入要打开的文件名称:")
fo = open(fname, "r")
print(type(fo))  # <class &#39;_io.TextIOWrapper&#39;>
for line in fo:
    print(line)
fo.close()

4. Reading method:

read(size): Read all the contents of the file at one time. If parameters are given, read the previous size length.
readline(size): Read a line of content, including the newline character '\n'. If given, size length before reading the line. Next time you can continue reading where you left off last time. If f.readline() returns an empty string, it means the last line has been read.
readlines([sizeint]): Read all lines and return a list. If sizeint>0 is given, return lines whose total sum is approximately sizeint bytes. The actual read value may be larger than sizeint is larger because the buffer needs to be filled.

f = open(&#39;32.txt&#39;, mode=&#39;rt&#39;, encoding=&#39;utf8&#39;)
print(f.readable())  #True 判断文件是否可读
data1 = f.readline()
data2 = f.readlines()
print(data1) # aaa
print(data2) # [&#39;bbb\n&#39;, &#39;ccc\n&#39;, &#39;nick最帅吗&#39;]
f.close()

2. File W opening mode

can only be written, not read. When the file exists, clear the file and then write the content; when the file does not exist The file will be created and the content will be written.

1. Text mode

f = open(&#39;34w.txt&#39;, mode=&#39;wt&#39;, encoding=&#39;utf8&#39;)
print(f"f.readable(): {f.readable()}") #False
f.write(&#39;nick 真帅呀\n&#39;)  # &#39;\n&#39;是换行符
f.write(&#39;nick,nick, you drop, I drop.&#39;)
f.write(&#39;nick 帅的我五体投地&#39;)
f.flush()  # 立刻将文件内容从内存刷到硬盘
f.close()

2. Byte mode

f = open(&#39;34a.txt&#39;, mode=&#39;wb&#39;)
f.write(&#39;nick 帅的我五体投地&#39;.encode(&#39;unicode_escape&#39;) )  # 编码成bytes类型再写入
print(type(&#39;nick 帅的我五体投地&#39;.encode(&#39;unicode_escape&#39;))) #<class &#39;bytes&#39;>
f.close()

Note: b mode is a universal mode, because all files on the hard disk are in binary format stored in form.

It should be noted that when reading and writing files in b mode, the encoding parameter must not be added, because binary cannot be re-encoded.

try:
    import requests

    response = requests.get( &#39;https://cache.yisu.com/upload/information/20220528/112/3002.jpg&#39;)
    data = response.content

    f = open(&#39;mv.jpg&#39;, &#39;wb&#39;)
    f.write(data)
    print(&#39;done...&#39;)
    f.close()
except Exception as e:
    print(e, &#39;报错了，那就算了吧，以后爬虫处会详细介绍&#39;)

3. Writing method:

write(s): Write the string to the file, and return the length of the characters written.
writelines(lines): Write a multi-line string list to the file. If line breaks are required, you must add newline characters to each line yourself.
flush(): Refresh the internal buffer of the file and directly write the data in the internal buffer to the file immediately instead of passively waiting for the output buffer to be written.

3. File a opening mode

a: Can be appended. If the file exists, the content will be written at the end of the file; if the file does not exist, the file will be created and the content will be written.

# at
f = open(&#39;34a.txt&#39;, mode=&#39;at&#39;, encoding=&#39;utf8&#39;)
print(f.readable()) # False
f.write(&#39;nick 真帅呀\n&#39;)  # &#39;\n&#39;是换行符
f.write(&#39;nick,nick, you drop, I drop.&#39;)
f.write(&#39;nick 帅的我五体投地&#39;)
f.close()

4. Readable and writable

r: Readable and writable. The file pointer will be placed at the beginning of the file.
rb: Readable and writable binary format
w: Writable and readable. If the file already exists, open the file and start editing from the beginning, that is, the original content will be deleted. If the file does not exist, create a new file.
wb: Writable, readable Binary format
a: Appendable, readable. If the file already exists, the file pointer will be placed at the end of the file. The file will be opened in append mode. If the file does not exist, a new file is created for reading and writing.
ab: Appendable, readable binary format

# r+t
with open(&#39;32.txt&#39;, &#39;r+&#39;, encoding=&#39;utf-8&#39;) as fr:
    print(fr.readable())  # True
    print(fr.writable())  # True

3. File pointer operations

Anything involving file pointers They are all bytes.

1. seek(offset, from_what): Change the position of the current file operation pointer

The value of from_what, if it is 0, it means the beginning, if it is 1, it means the current position, and 2 means the end of the file , for example:

seek(x,0): Move x characters from the starting position, which is the first character of the first line of the file
seek (x,1): means moving x characters backward from the current position
seek(-x,2)：表示从文件的结尾往前移动x个字符

from_what 值为默认为0，即文件开头。

f.seek(0) # 回到文件开头

下面给出一个完整的例子：

f = open(&#39;32.txt&#39;, &#39;rb+&#39;)
print(f.write(b&#39;0123456789abcdef&#39;)) # 16
print(f.seek(5))  # 移动到文件的第六个字节 # 5
print(f.read(1))  # b&#39;5&#39;
print(f.seek(-3, 2))  # 移动到文件的倒数第三字节 # 13
print(f.read(1)) # b&#39;d&#39;

2、tell()：告诉文件当前位置。

每次统计都是从文件开头到当前指针所在位置

with open(&#39;32.txt&#39;, &#39;rb&#39;) as fr:
    fr.seek(4, 0)
    print(fr.tell() )  # 4

3、truncate([size])：截断

从文件的首行首字符开始截断，截断文件为 size 个字符，无 size 表示从当前位置截断；

截断之后后面的所有字符被删除，其中 Widnows 系统下的换行代表2个字符大小。

文件的打开方式必须可写，但是不能用w或w+等方式打开，因为那样直接清空文件了，所以truncate()要在r+或a或a+等模式下测试效果。它的参照物永远是文件头。

truncate()不加参数，相当于清空文件。

with open(&#39;32.txt&#39;, &#39;ab&#39;) as fr:
    fr.truncate(2)  # 截断2个字节后的所有字符，如果3个字节一个字符，只能截断2/3个字符，还会遗留1/3个字符，会造成乱码

四、文件修改的两种方式

以读的方式打开原文件，以写的方式打开一个新的文件，把原文件的内容进行修改（一行一行的修改或者全部修改），然后写入新文件，之后利用os模块的方法，把原文件删除，重命名新文件为原文件名，达到以假乱真的目的。

方式1、将硬盘存放的该文件的内容全部加载到内存，在内存中是可以修改的，修改完毕后，再由内存覆盖到硬盘（word，vim，nodpad++等编辑器）。

import os

with open(&#39;37r.txt&#39;) as fr, open(&#39;37r_swap.txt&#39;, &#39;w&#39;) as fw:
    data = fr.read()  # 全部读入内存,如果文件很大,会很卡
    data = data.replace(&#39;tank&#39;, &#39;tankSB&#39;)  # 在内存中完成修改

    fw.write(data)  # 新文件一次性写入原文件内容

# 删除原文件
os.remove(&#39;37r.txt&#39;)
# 重命名新文件名为原文件名
os.rename(&#39;37r_swap.txt&#39;, &#39;37r.txt&#39;)

方式2、将硬盘存放的该文件的内容一行一行地读入内存，修改完毕就写入新文件，最后用新文件覆盖源文件。

import os

with open(&#39;37r.txt&#39;) as fr, open(&#39;37r_swap.txt&#39;, &#39;w&#39;) as fw:
     for line in fr: # 循环读取文件内容，逐行修改
        line = line.replace(&#39;jason&#39;, &#39;jasonSB&#39;)
        fw.write(line)  # 新文件写入原文件修改后内容

os.remove(&#39;37r.txt&#39;)
os.rename(&#39;37r_swap.txt&#39;, &#39;37r.txt&#39;)

五、os文件处理模块

os模块负责程序与操作系统的交互，提供了访问操作系统底层的接口，多用于文件处理。

import os

1、os模块

os.getcwd()：获取当前工作目录，即当前python脚本工作的目录路径
os.chdir("dirname")：改变当前工作目录；相当于shell下cd
os.curdir：返回当前目录: ('.')
os.pardir：获取当前目录的父目录字符串名：('..')
os.listdir('dirname'):列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
os.chmod(path, mode)：更改权限
os.mkdir('dirname'):生成单级目录；相当于shell中mkdir dirname
os.makedirs('dirname1/dirname2')：可生成多层递归目录
os.remove(path)：删除路径为path的文件。如果path 是一个文件夹，将抛出OSError; 查看下面的rmdir()删除一个 directory。
os.removedirs('dirname1'):若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
os.rmdir('dirname'):删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
os.rename("oldname","newname")：重命名文件/目录
os.renames(old, new)：递归地对目录进行更名，也可以对文件进行更名。
os.stat('path/filename'):获取文件/目录信息
os.sep:输出操作系统特定的路径分隔符，win下为"\",Linux下为"/"
os.linesep:输出当前平台使用的行终止符，win下为"\t\n",Linux下为"\n"
os.pathsep:输出用于分割文件路径的字符串 win下为;,Linux下为:
os.name:输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command"):运行shell命令，直接显示
os.environ:获取系统环境变量

2、os.path 模块

主要用于获取文件的属性。

以下是 os.path 模块的几种常用方法：

os.path.abspath(path):返回path规范化的绝对路径
os.path.split(path):将path分割成目录和文件名二元组返回
os.path.splitdrive(path)：一般用在 windows 下，返回驱动器名和路径组成的
os.path.splitext(path)：分割路径，返回路径名和文件扩展名的元组
os.path.dirname(path):返回path的目录名。其实就是os.path.split(path)的第一个元素
os.path.basename(path):返回path最后的文件名。如何path以／或\结尾，那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path):如果path存在，返回True；如果path不存在，返回False
os.path.isabs(path):如果path是绝对路径，返回True
os.path.isfile(path):如果path是一个存在的文件，返回True。否则返回False
os.path.isdir(path):如果path是一个存在的目录，则返回True。否则返回False
os.path.join(path2[, path3[, ...]]):将多个路径组合后返回，第一个绝对路径之前的参数将被忽略
os.path.getatime(path):返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path):返回path所指向的文件或者目录的最后修改时间
os.path.getsize(path): 返回文件大小，如果文件不存在就返回错误

3、实例：

1、获取指定目录及其子目录下的 py 文件

import os
import os.path

"""获取指定目录及其子目录下的 py 文件路径说明：l 用于存储找到的 py 文件路径 get_py 函数，递归查找并存储 py 文件路径于 l"""
l = []


def get_py(path, l):
    file_list = os.listdir(path)  # 获取path目录下所有文件
    for filename in file_list:
        path_tmp = os.path.join(path, filename)  # 获取path与filename组合后的路径
        if os.path.isdir(path_tmp):  # 如果是目录
            get_py(path_tmp, l)  # 则递归查找
        elif filename[-3:].upper() == &#39;.PY&#39;:  # 不是目录,则比较后缀名
            l.append(path_tmp)


path = input(&#39;请输入路径:&#39;).strip()
get_py(path, l)
print(&#39;在%s目录及其子目录下找到%d个py文件\n分别为：\n&#39; % (path, len(l)))
for filepath in l:
    print(filepath + &#39;\n&#39;)

2、显示所有视频格式文件，mp4，avi，rmvb

import os

vedio_list = []

def search_file(start_dir, target) :
    os.chdir(start_dir)
    
    for each_file in os.listdir(os.curdir) :
        ext = os.path.splitext(each_file)[1]
        if ext in target :
            vedio_list.append(os.getcwd() + os.sep + each_file + os.linesep) 
        if os.path.isdir(each_file) :
            search_file(each_file, target) # 递归调用
            os.chdir(os.pardir) # 递归调用后切记返回上一层目录

start_dir = input(&#39;请输入待查找的初始目录：&#39;)
program_dir = os.getcwd()

target = [&#39;.mp4&#39;, &#39;.avi&#39;, &#39;.rmvb&#39;]

search_file(start_dir, target)

f = open(program_dir + os.sep + &#39;vedioList.txt&#39;, &#39;w&#39;)
f.writelines(vedio_list)
f.close()

3、批量修改文件名

import os

path = input(&#39;请输入文件路径(结尾加上/)：&#39;)

# 获取该目录下所有文件，存入列表中
fileList = os.listdir(path)

n = 0
for i in fileList:
    # 设置旧文件名（就是路径+文件名）
    oldname = path + os.sep + fileList[n]  # os.sep添加系统分隔符

    # 设置新文件名
    newname1 = path + os.sep + &#39;a&#39; + str(n + 1) + &#39;.JPG&#39;

    os.rename(oldname, newname1)  # 用os模块中的rename方法对文件改名c:\
    print(oldname, &#39;======>&#39;, newname1)

    n += 1

七、glob模块:使用通配符查找文件

用它可以查找符合特定规则的文件路径名。跟使用windows下的文件搜索差不多。

查找文件只用到三个匹配符：“*”, “?”, “[]”。

“*”：匹配0个或多个字符；
“?”：匹配单个字符；
“[]”：匹配指定范围内的字符，如：[0-9]匹配数字。

1、glob.glob:返回所有匹配的文件路径列表。

它只有一个参数pathname，定义了文件路径匹配规则，这里可以是绝对路径，也可以是相对路径。

输出：类型是list型，然后就是输出相关的文件路径了

import glob

file = glob.glob(r&#39;C:\工具\*\*\pan*.exe&#39;)
print(type(file))  # <class &#39;list&#39;>
print(file)  # [&#39;C:\\工具\\PanDownload_v2.1.3\\PanDownload\\PanDownload.exe&#39;]

#获取上级目录的所有.py文件
print (glob.glob(r&#39;../*.py&#39;)) #相对路径

2、glob.iglob:获取一个可遍历对象，使用它可以逐个获取匹配的文件路径名。

与glob.glob()的区别是：glob.glob同时获取所有的匹配路径，而glob.iglob一次只获取一个匹配路径。

下面是一个简单的例子：

import glob

#父目录中的.py文件
f = glob.iglob(r&#39;../*.py&#39;)
print ( f ) # <generator object iglob at 0x00B9FF80>
for py in f:
   print  (py)

The above is the detailed content of How to use Python file processing methods, os module and glob module. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete

详细讲解Python之Seaborn（数据可视化）Apr 21, 2022 pm 06:08 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于Seaborn的相关问题，包括了数据可视化处理的散点图、折线图、条形图等等内容，下面一起来看一下，希望对大家有帮助。

详细了解Python进程池与进程锁May 10, 2022 pm 06:11 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于进程池与进程锁的相关问题，包括进程池的创建模块，进程池函数等等内容，下面一起来看一下，希望对大家有帮助。

Python自动化实践之筛选简历Jun 07, 2022 pm 06:59 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于简历筛选的相关问题，包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容，下面一起来看一下，希望对大家有帮助。

归纳总结Python标准库May 03, 2022 am 09:00 AM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于标准库总结的相关问题，下面一起来看一下，希望对大家有帮助。

Python数据类型详解之字符串、数字Apr 27, 2022 pm 07:27 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于数据类型之字符串、数字的相关问题，下面一起来看一下，希望对大家有帮助。

分享10款高效的VSCode插件，总有一款能够惊艳到你！！Mar 09, 2021 am 10:15 AM

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件，能够让原本单薄的VS Code如虎添翼，开发效率顿时提升到一个新的阶段。

详细介绍python的numpy模块May 19, 2022 am 11:43 AM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于numpy模块的相关问题，Numpy是Numerical Python extensions的缩写，字面意思是Python数值计算扩展，下面一起来看一下，希望对大家有帮助。

python中文是什么意思Jun 24, 2019 pm 02:22 PM

pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间，Guido van Rossum在家闲的没事干，为了跟朋友庆祝圣诞节，决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python，所以便把这门语言叫做python。

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

1 months agoByDDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

1 months agoByDDD

R.E.P.O. Best Graphic Settings

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

1 weeks agoByDDD

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 Linux new version

SublimeText3 Linux latest version

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.