Home  >  Article  >  Backend Development  >  A comparison of four different ways to read files in Python

A comparison of four different ways to read files in Python

黄舟
黄舟Original
2017-05-22 23:23:202066browse

Python's text processing is a problem that is often encountered. The following article mainly introduces to you the comparison of several different methods for Pythonreading files Information, detailed sample codes are given in the article for everyone to understand and learn. Friends who need it can take a look below.

Preface

Everyone knows that Python has many ways to read files, but when a large file needs to be read, different Reading methods will have different effects. Let’s take a look at the detailed introduction below.

Scenario

Read a 2.9G large file line by line

  • CPU i7 6820HQ

  • RAM 32G

Method

Split each line read once String Operation

The following methods all use the with...as method to open the file.

The with statement is suitable for accessing resources to ensure that regardless of whether an exception occurs during use, the necessary "cleaning" operations will be performed to release resources, such as automatic closing of files after use and automatic acquisition of locks in threads. and release etc.

Method 1 The most common way to read files

with open(file, 'r') as fh:
 for line in fh.readlines():
 line.split("|")

Running result: It took 15.4346568584 seconds

The system monitor shows that the memory suddenly jumped from 4.8G to 8.4G. fh.readlines() will save all the lines of data read into the memory. This method is suitable for small files.

Method 2

with open(file, 'r') as fh:
 line = fh.readline()
 while line:
 line.split("|")

Running result: It took 22.3531990051 seconds

There is almost no change in the memory, because the memory Only one row of data is accessed, but the time is obviously longer than the previous time, which is not efficient for further processing of the data.

Method 3

with open(file) as fh:
 for line in fh:
 line.split("|")

Running result: It took 13.9956979752 seconds

There is almost no change in the memory and the speed is also Faster than method two.

for line in fh treats the file object fh as an iterable, which automatically uses buffered IO and memory management, so you don't have to worry about large files. This is a very pythonic way!

Method 4 fileinput module

for line in fileinput.input(file):
 line.split("|")

Running result: It took 26.1103110313 seconds

The memory increased by 200- 300 MB, the slowest of the above.

Summary

The above methods are for reference only. The three recognized methods for reading large files are still the best. However, the specific situation still depends on the performance of the machine and the complexity of data processing.

[Related recommendations]

1. Code example of n lines after Python reads the file

2. Read the file using python Applets

The above is the detailed content of A comparison of four different ways to read files in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn