Home >Backend Development >Python Tutorial >How to read txt text line by line and remove duplicates under python3.4.3

How to read txt text line by line and remove duplicates under python3.4.3

不言
不言Original
2018-05-02 16:17:422913browse

This article mainly introduces the method of reading txt text line by line and removing duplicates under python3.4.3. It has a certain reference value. Now I share it with you. Friends in need can refer to it

Issues that should be paid attention to when reading and writing files include:

1. Character encoding

2. Close the file descriptor immediately after the operation is completed

3. Code compatibility

Several methods:

#!/bin/python3
original_list1=[" "]
original_list2=[" "]
original_list3=[" "]
original_list4=[" "]
newlist1=[" "]
newlist2=[" "]
newlist3=[" "]
newlist4=[" "]
newtxt1=""
newtxt2=""
newtxt3=""
newtxt4=""
#first way to readline
f = open("duplicate_txt.txt","r+")    # 返回一个文件对象  
line = f.readline()           # 调用文件的 readline()方法 
while line:  
  original_list1.append(line)          
  line = f.readline()  
f.close() 
#use "set()" remove duplicate str in the list
# in this way,list will sort randomly
newlist1 = list(set(original_list1))
#newlist1 = {}.fromkeys(original_list1).keys() #faster 
#rebuild a new txt 
newtxt1="".join(newlist1)
f1 = open("noduplicate1.txt","w")
f1.write(newtxt1)
f1.close()
###################################################################
#second way to readline
for line in open("duplicate_txt.txt","r+"):  
  original_list2.append(line)
newlist2 = list(set(original_list2))
newlist2.sort(key=original_list2.index)         #sort
#newlist2 = sorted(set(original_list2),key=l1.index)  #other way
newtxt2="".join(newlist2)
f2 = open("noduplicate2.txt","w")
f2.write(newtxt2)
f2.close()
###################################################################
#third way to readline
f3 = open("duplicate_txt.txt","r")  
original_list3 = f3.readlines()       #读取全部内容 ,并以列表方式返回 
for i in original_list3:          #遍历去重
  if not i in newlist3:
      newlist3.append(i)
newtxt3="".join(newlist3)
f4 = open("noduplicate3.txt","w")
f4.write(newtxt3)
f4.close()
###################################################################
#fourth way
f5 = open('duplicate_txt.txt',"r+") 
try: 
  original_list4 = f5.readlines() 
  [newlist4.append(i) for i in original_list4 if not i in newlist4]
  newtxt4="".join(newlist4)
  f6 = open("noduplicate4.txt","w")
  f6.write(newtxt4)
  f6.close()
finally: 
  f5.close()

Result:

Before deduplication:

##After deduplication (out of order ):

##After deduplication (in order):

SummaryThe program below involves file read and write operations and linked list operations. Several issues mentioned at the beginning of the article , since we are not using Chinese, we don’t care about the encoding, but I still have to mention it here:

f = open("test.txt","w")
f.write(u"你好")

The above code will report an error if run in python2

#The error is reported because the program cannot save the unicode string directly. It must be encoded and converted into a binary byte sequence of type str before it can be saved.

The write() method will automatically convert the encoding, using ascii encoding format by default, and ascii cannot handle Chinese, so UnicodeEncodeError occurs.

The correct way is to manually convert the format before calling the write() method, and use utf-8 or gbk to convert to str.

f = open("test.txt","w")
text=u"你好"
text=text.encode(encoding='utf-8')
f.write(text)

About close(): What will be the impact of not closing? ? After the operation is completed, not closing the file will cause a waste of system resources, because the number of file descriptors that can be opened by the system is limited. Linux is 65535.

Generally speaking, it will be OK after close, but there may be special situations. For example, an error has occurred when calling the open() function, and the permissions are insufficient. Calling close() will definitely report an error. Another method is that if there is insufficient disk space during write(), an error will be reported, and close() will have no chance to execute. The correct way is to use try except to catch the exception:

f = open("test.txt","w")
try:
  text=u"你好"
  text=text.encode(encoding='utf-8')
  f.write(text)
except: IOError as e:
  print("oops,%s"%e.args[0])
finally:
  f.close()

A more elegant way of writing is to use with...as.

with open("test.txt","w") as f:
  text=u"你好"
  f.write(text.encode(encoding='utf-8'))

The file object implements the morning and afternoon manager protocol. When the program enters the with statement, the file object will be assigned to the variable f. When the program exits with, it will automatically Call the close() method.

About compatibility issues: The open() functions of python2 and python3 are different. The latter can specify characters in the function. Encoding format.

How to solve the compatibility open() problem between python2 and python3?

Use the open() function under the io module. io.open in python2 is equivalent to the open function of python3

from io import open
with open("test.txt","w",encoding='utf-8') as f:
  f.write(u"你好")

Related recommendations :


Example of decompressing a zip file and deleting the file under python_python


##

The above is the detailed content of How to read txt text line by line and remove duplicates under python3.4.3. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn