这个文本文件核心有几种情况:
1.序号 ID 操作者 操作行为 操作行为 操作对象
6883 556773833 RemyMCMXI
6880 556772838 Mindmatrix restored undeleted RemyMCMXI
6882 556771715 RemyMCMXI
6881 556770863 RemyMCMXI
6880 556673938 Liua97
6879 554350969 Epicgenius
6880 554332653 Alex
找到restored所在行,得到该行序号6880,然后往下读,找到第一个与其序号相同的行(liua97那行),然后记录下这两行之间所有的id值(包括restored那行),这个例子就是记录下556772838 556771715 556770863这三个id。
2.序号 ID 操作者 操作行为 操作行为 操作对象
208 1675137 Netizen restored undeleted Netizen
207 1648639 Netizen
206 1648621 142.58.181.84
205 1646546 Patrick
204 1638165 Patrick
找到restored所在行,然而往下找不到这个restored行的序号208,这个时候就是读取undelted对象,然后往下找相邻的这个对象的操作者(一般情况下这个restored往下的相邻行的操作者这个对象)。比如这个例子就是记录1675137 1648639这两个id。
3.序号 ID 操作者 操作行为 操作行为 操作对象
153 1254853 Eloquence restored undeleted Eloquence
152 1254819 Eloquence
151 1254815 Eloquence
150 1254812 Eloquence
149 1254799 Eloquence
148 1254796 Eloquence
147 1254782 Eloquence
146 1254771 Eloquence
145 1254740 217.185.183.250
这个同样也是,restored所在行的序号153往下找找不到,然后undeleted的对象在下面连续出现,这个时候就是记录连续出现的所有行的id,就是1254853 819 815 812 799 796 783 771这几个id。
然后现在我对这样的处理没有什么思路,,文本文件的一行就是一个字符串 split的话那些没有包含restored的行就会出现数组越界。。。我就不知道该怎么处理了,求各位大神给个思路orz
迷茫2017-04-17 17:44:39
For the first situation, you can refer to the following methods:
If you are using Python3 or above:
with open('G:\reserve9.txt', 'r') as reader:
flag = False
flag_number = None
for line in reader:
number, ID, *items = line.split()
if not flag and 'restored' in items:
flag = True
flag_number = number
elif flag and number==flag_number:
flag = False
flag_number = None
if flag:
print(ID)
Explain the code a little bit. The file object generated by using open
can be directly used as an iterator. Using for line in reader:
will be more concise than using readlines. open
產生的 file object 可以直接當作一個 iterator, 利用 for line in reader:
比起 readlines 會更簡明。
另外 number, ID, *items = line.split()
是 unpacking 的用法, 他會將 line.split()
切割出來的字串分別配給 number(配到第一個切割字串) ID(配到第二個), 最後會把其他的切割字串集成一個 list 配給 items
(打星號的那個變數)。
不過這個用法不一定適用所有的 Python 版本,所以如果你用的是 Python2.7,可以採行下列做法:
with open('reserve9.txt') as reader:
flag = False
flag_number = None
for line in reader:
items = line.split()
number = items[0]
ID = items[1]
if not flag and 'restored' in items:
flag = True
flag_number = number
elif flag and number==flag_number:
flag = False
flag_number = None
if flag:
print ID
至於這種做法的思路很單純,設置一個 flag
標誌用來判斷該行的 ID 是否要被印出或收集。其次每一行都要用 number
和 flag_number
來判斷是否要開關 flag
。
下面是Python3的代碼,如果有需要,可以自行將 print
改為 Python2 的用法,差別應該只有在這。(抱歉因為寫得很快,代碼可能不夠精緻)
考慮到所有情形,首先定義了兩個類: IdCollect
用來收集 ID 以及 Action
用來對象化一個操作:
*IdCollect
類
class IdCollect:
def __init__(self):
self.dic = {}
self.outputs = []
self.idx = 0
self.newest_action = None
def do_new_a_collect(self, action):
if not self.dic.get(action.number, {}):
if 'restored' in action.ops:
return True
return False
def do_finish_a_collect(self, action):
collect = self.dic.get(action.number, {})
if collect:
return True
return False
def handle(self, action):
print('handle...', action)
if self.do_new_a_collect(action):
print('--- do collect new...')
self.collect_new(action)
elif self.do_finish_a_collect(action):
print('--- do collect finish...')
self.collect_finish(action)
else:
print('--- do collect...')
self.collect(action)
def collect(self, action):
if self.newest_action:
current_collect = self.dic[self.newest_action.number]
else:
print('do nothing')
return
# collect undeleted
if not current_collect['undeleted_finish']:
if action.user1==current_collect['undeleted_user']:
print('------ collect undeleted')
current_collect['undeleted_buffer'].append(action)
else:
print(action.user1, current_collect['undeleted_user'])
print('------ finish undeleted')
current_collect['undeleted_finish'] = True
# collect restored
print('------ collect restored')
current_collect['restored_buffer'].append(action)
def collect_new(self, action):
undeleted_buffer = []
undeleted_user = None
restored_buffer = []
if 'undeleted' in action.ops:
undeleted_buffer.append(action)
undeleted_user = action.user2
restored_buffer.append(action)
self.dic[action.number] = {
'undeleted_buffer': undeleted_buffer,
'undeleted_user': undeleted_user,
'undeleted_finish': False,
'restored_buffer': restored_buffer,
'restored_finish': False,
'idx': self.idx
}
self.idx += 1
self.newest_action = action
def collect_finish(self, action):
collect = self.dic[action.number]
collect['restored_finish'] = True
self.outputs.append(collect)
self.dic[action.number] = {}
self.newest_action = None
def output(self):
for number, collect in self.dic.items():
if collect:
self.outputs.append(collect)
self.outputs.sort(key=lambda collect: collect['idx'])
for collect in self.outputs:
if collect['restored_finish']:
for action in collect['restored_buffer']:
print('r', action.ID)
else:
if collect['undeleted_buffer']:
for action in collect['undeleted_buffer']:
print('d', action.ID)
*Action
number, ID, *items = line.split()
is the usage of unpacking. It will allocate the strings cut by line.split()
to number( Allocated to the first cutting string) ID (allocated to the second), and finally the other cutting strings will be integrated into a list and allocated to items
(the variable marked with an asterisk). However, this usage may not be applicable to all Python versions, so if you are using Python2.7, you can adopt the following methods:
class Action:
def __init__(self, action_str):
action_str = action_str.strip()
items = action_str.split()
self.number = items[0]
self.ID = items[1]
self.user1 = items[2]
self.ops = items[3:]
if len(self.ops) > 1:
self.ops = self.ops[:-1]
self.user2 = items[-1]
else:
self.user2 = ''
def __str__(self):
return ' '.join([str(item) for item in [self.number, self.ID, self.user1, self.ops, self.user2]])
The idea of this approach is very simple. Set a flag
flag to determine whether the ID of the row should be printed or collected. Secondly, each line must use number
and flag_number
to determine whether to switch flag
on or off.
The following is the code for Python3. If necessary, you can change
print
to the usage of Python2. The difference should be only here. (Sorry because I wrote it quickly, the code may not be refined enough)🎜
🎜Considering all scenarios, two classes are first defined: IdCollect
for collecting IDs and Action
for objectifying an action: 🎜
🎜*IdCollect
class🎜
with open('reserve9.txt') as reader:
id_collect = IdCollect()
for line in reader:
action = Action(line)
id_collect.handle(action)
print('-- output --')
id_collect.output()
🎜*Action
Class:🎜
6883 556773833 RemyMCMX
6880 556772838 Mindmatrix restored undeleted RemyMCMXI
6882 556771715 RemyMCMXI
6881 556770863 RemyMCMXI
6880 556673938 Liua97
6879 554350969 Epicgenius
6880 554332653 Alex
13 82239 194.205.123.10 restored undeleted 62.30.0.4
14 64090 62.30.0.4
13 64041 Lee Daniel Crocker
12 61789 JeLuF
11 55828 Conversion script
10 294279 62.82.226.xxx
9 294278 Larry_Sanger
8 294277 Larry_Sanger
7 334555726 24.112.58.xxx
5 334555725 156.62.18.xxx restored undeleted 156.62.18.xxx
6 334555724 156.62.18.xxx
5 334555723 AxelBoldt
4 334555722 The Cunctator
3 334555721 The Cunctator
1 334555720 Alan D
2 334555718 64.38.175.xxx
1 334555717 The Cunctator
5 334555725 156.62.18.xxx restored undeleted 156.62.18.xxx
6 334555724 156.62.18.xxx
6 334555724 156.62.18.xxx
6 334555724 156.62.18.xxx
6 334555724 156.62.18.xxx
6 334555724 156.62.18.xxx
1 334555720 Alan D
1 334555720 Alan D
1 334555720 Alan D
1 334555720 Alan D
1 334555720 Alan D
1 334555720 Alan D
1 334555720 Alan D
13 82239 194.205.123.10 restored undeleted 62.30.0.4
13 64041 Lee Daniel Crocker
🎜The last is the method to use:🎜
...一些省略掉的收集過程...
-- output --
r 556772838
r 556771715
r 556770863
r 82239
r 64090
r 334555725
r 334555724
d 334555725
d 334555724
d 334555724
d 334555724
d 334555724
d 334555724
r 82239
🎜
🎜The following is a test file I scribbled:🎜
rrreee
🎜The output looks like this:🎜
rrreee