Home >Backend Development >Python Tutorial >How to Handle Memory Issues When Reading Large CSV Files in Python?

How to Handle Memory Issues When Reading Large CSV Files in Python?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-11-09 05:07:02455browse

How to Handle Memory Issues When Reading Large CSV Files in Python?

Reading Vast CSV Files in Python

In Python 2.7, users often encounter memory issues when reading CSV files with millions of rows and hundreds of columns. This article addresses these challenges and offers solutions to process large CSV files effectively.

Original Code and Issues

The provided code aims to read specific rows from a CSV file based on a given criterion. However, it loads all rows into a list before processing, which leads to memory errors for files exceeding 300,000 rows.

Solution 1: Process Rows Incrementally

To eliminate the memory issue, it is crucial to process rows incrementally instead of storing them in a list. A generator function can be used to achieve this:

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        for row in datareader:
            if row[3] == criterion:
                yield row

This function yields the header row and subsequent rows that match the criterion, and then stops reading.

Solution 2: Optimized Filtering

Alternatively, a more concise filtering method can be employed:

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        yield from takewhile(
            lambda r: r[3] == criterion,
            dropwhile(lambda r: r[3] != criterion, datareader))

This method uses the takewhile and dropwhile functions from the itertools module to filter the rows.

Updated Code

In the getdata function, the list comprehension is replaced with a generator comprehension:

def getdata(filename, criteria):
    for criterion in criteria:
        for row in getstuff(filename, criterion):
            yield row

Conclusion

By using generator functions and optimizing filtering techniques, it is possible to process large CSV files effectively, avoiding memory errors and significantly improving performance.

The above is the detailed content of How to Handle Memory Issues When Reading Large CSV Files in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn