Home  >  Article  >  Backend Development  >  Python method to traverse all files in a directory

Python method to traverse all files in a directory

WBOY
WBOYOriginal
2016-07-22 08:56:251485browse

os.walk generator
os.walk(PATH), PATH is a folder path, of course you can use it. Or.../this way.
What is returned is a list of triplet elements, each element represents the content of a folder. The first one is the content of the current folder.
The returned triple represents (the working folder, the list of folders under this folder, the list of files under this folder).
So,
Get all subfolders, that is (d represents this triplet):

os.path.join(d[0],d[1]);

Get all sub-files, that is:

 os.path.join(d[0],d[2]);

The following example uses two sets of loops. After traversing, a list of all file names is obtained and then all files are looped:

result = [os.path.join(dp, f) for dp, dn, fs in os.walk("_pages") for f in fs if os.path.splitext(f)[1] == '.html']
for fname in result:
 #do something

actually equals

result=[]
for dp, dn, fs in os.walk("_pages"):
 for f in fs:
 if (os.path.splitext(f)[1] == '.html'):
  result.append(os.path.join(dp, f))
for fname in result:
 #do something

Finally determine whether the html suffix is ​​used to obtain the file name. You can also use glob:

result = [y for x in os.walk(PATH) for y in glob.glob(os.path.join(x[0], '*.txt'))]

You can also use iterator methods:

from itertools import chain
import glob
result = (chain.from_iterable(glob.iglob(os.path.join(x[0], '*.txt')) for x in os.walk('.')))

Advanced
The standard file number traversal generator os.walk is both powerful and flexible. However, os.walk still lacks some detailed processing capabilities required by applications, such as selecting files according to a certain pattern and performing operations on all files (or directories). Sorting, or only traversing the current directory without entering its subdirectories, so the interface needs to be encapsulated.

import os, fnmatch 
 
def filter_files(dirname, patterns='*', single_level=False, yield_folders=False): 
  patterns = patterns.split(';') 
  allfiles = [] 
  for rootdir, subdirname, files in os.walk(dirname): 
    print subdirname 
    allfiles.extend(files) 
    if yield_folders: 
      allfiles.extend(dubdirname) 
    if single_level: 
      break 
  allfiles.sort() 
  for eachpattern in patterns: 
    for eachfile in fnmatch.filter(allfiles, eachpattern): 
        print os.path.normpath(eachfile) 

Description:
1.The difference between extend and append
Lists are implemented as classes. "Creating" a list actually instantiates a class. Therefore, lists can be manipulated in multiple ways. Lists can contain elements of any data type, and the elements in a single list do not need to be all of the same type. The append() method adds a new element to the end of the list. Accepting only one parameter, the extend() method only accepts a list as a parameter and adds each element of the parameter to the original list.
2. fnmatch module
The fnmatch module uses patterns to match file names. The pattern syntax is the same as that used in Unix shells. An asterisk (*) matches zero or more characters, and a question mark (?) matches a single character. You can also use square brackets to specify a character range, for example [0-9] represents a number, and all other characters match themselves.
1) fnmatch.fnmatch(name, pattern) method: tests whether name matches pattern and returns true/false
2) fnmatch.filter(names, pat) implements filtering or filtering of special characters in the list and returns a list of characters that match the matching pattern. Of course, names represents the list

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn