一个草图:
现实现在文件夹和子文件夹下查找目标字符串,
但不知如何提取包含目标字符的字符串,并写入到新文件中。
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import os, sys
import fnmatch
listonly = False
skipexts = ['.js']
def visitfile(fname,searchkey):
global fcount,vcount
try:
if not listonly:
if os.path.splitext(fname)[1] in skipexts:
if open(fname).read().find(searchkey) != -1:
print '%s has %s '%(fname,searchkey)
fcount+=1
except: pass
vcount +=1
def visitor(args,directoryName,filesInDirectory):
for fname in filesInDirectory:
# 返回文件所在路径和文件名
fpath = os.path.join(directoryName,fname)
if not os.path.isdir(fpath):
visitfile(fpath,args)
def searcher(startdir,searchkey):
global fcount,vcount
fcount = vcount = 0
os.path.walk(startdir,visitor,searchkey)
if __name__=='__main__':
# root=raw_input("type root directory:")
root = '/home/jiangbin/findJS'
key=raw_input("type key:")
searcher(root,key)
print 'Found in %d files,visited %d'%(fcount,vcount)
run
type key:JSQ
/home/jiangbin/findJS/XXX.js has JSQ
/home/jiangbin/findJS/JSQ.js has JSQ
Found in 2 files,visited 19
天蓬老师2017-04-18 09:05:20
You are almost done...
https://gist.github.com/wusisu/e08ee53513c4410cf9ddd1ba5b0b80f5
I did it for you
----But actually, using shell is ok--------
find . -type f -name "*.js" | xargs grep work_to_be_searched
find . -type f -name "*.js" | xargs grep work_to_be_searched > out.txt
The type f
of find here means that only the file name will be displayed, which ends with .js
Passed via xargs
Use grep to search for keywords
Finally use >
to export
PHP中文网2017-04-18 09:05:20
If you are using linux, then I suggest you use grep
Just fine:
$ ls mydir
a.js b.js c.js
$ grep JSQ mydir/*.js
mydir/a.js:abcdefg JSQ abcdefg
mydir/a.js:JSQ abcdefg abcdefg
mydir/a.js:abcdefg abcdefg JSQ
mydir/c.js:abcdefg JSQ abcdefg
mydir/c.js:JSQ abcdefg abcdefg
mydir/c.js:abcdefg abcdefg JSQ
(In the above example, there is something wrong with the display of the first line, it should be like this:grep JSQ mydir/*.js
)
You can also import it into a file:
$ grep JSQ mydir/* > results.txt
Then you can organize and compile statistics from results.txt
.
If you insist on using Python, I wrote a code that should be more optimized, you can refer to it:
import os
import glob
def search(root, key, ftype='', logname=None):
ftype = '*.'+ftype if ftype else '*'
logname = logname or os.devnull
symbol = os.path.join(root, ftype)
fnames = glob.glob(symbol)
vc = len(fnames)
fc = 0
with open(logname, 'w') as writer:
for fname in fnames:
found = False
with open(fname) as reader:
for idx, line in enumerate(reader):
line = line.strip()
if key in line.split():
line = line.replace(key, '**'+key+'**')
found = True
print('{} -- {}: {}'.format(fname, idx, line), file=writer)
if found:
fc = fc + 1
print('{} has {}'.format(fname, key))
return vc, fc
search(root, key, ftype='', logname=None)
will be under root
this path
Look for files with the extension ftype
(if not given, all files will be accepted)
Search inside to see if it contains the keyword key
If there is a log file for logname
,則會輸出關鍵字前後用 '**'
highlight, the content is each line containing the keyword
can actually be used like this (search.py
):
if __name__=='__main__':
root = 'mydir'
key = input("type key: ")
vc, fc = search(root, key, 'js', logname='results')
print('Found in {} files, visited {}'.format(fc, vc))
Run:
$ python3 search.py
type key: JSQ
mydir/c.js has JSQ
mydir/a.js has JSQ
Found in 2 files, visited 3
logfile results
:
mydir/c.js -- 0: abcdefg **JSQ** abcdefg
mydir/c.js -- 1: **JSQ** abcdefg abcdefg
mydir/c.js -- 2: abcdefg abcdefg **JSQ**
mydir/a.js -- 0: abcdefg **JSQ** abcdefg
mydir/a.js -- 1: **JSQ** abcdefg abcdefg
mydir/a.js -- 2: abcdefg abcdefg **JSQ**
Questions I answered: Python-QA