首页 >后端开发 >Python教程 >在 Python 中读取大型 CSV 文件时如何处理内存问题？

在 Python 中读取大型 CSV 文件时如何处理内存问题？

Mary-Kate Olsen原创: 2024-11-09 05:07:02543浏览

How to Handle Memory Issues When Reading Large CSV Files in Python?

在 Python 中读取海量 CSV 文件

在 Python 2.7 中，用户在读取数百万行和数百个 CSV 文件时经常会遇到内存问题。列。本文解决了这些挑战，并提供了有效处理大型 CSV 文件的解决方案。

原始代码和问题

提供的代码旨在从基于 CSV 文件读取特定行根据给定的标准。但是，它在处理之前将所有行加载到列表中，这会导致超过 300,000 行的文件出现内存错误。

解决方案 1：增量处理行

要消除内存问题，增量处理行而不是将它们存储在列表中至关重要。可以使用生成器函数来实现此目的：

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        for row in datareader:
            if row[3] == criterion:
                yield row

此函数生成符合条件的标题行和后续行，然后停止读取。

解决方案 2：优化过滤

或者，更简洁的过滤方法可以是使用：

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        yield from takewhile(
            lambda r: r[3] == criterion,
            dropwhile(lambda r: r[3] != criterion, datareader))

此方法使用 itertools 模块中的 takewhile 和 dropwhile 函数来过滤行。

更新的代码

在getdata 函数，列表理解被生成器替换理解：

def getdata(filename, criteria):
    for criterion in criteria:
        for row in getstuff(filename, criterion):
            yield row

结论

通过使用生成器函数和优化过滤技术，可以有效地处理大型 CSV 文件，避免内存错误并显着提高性能.

以上是在 Python 中读取大型 CSV 文件时如何处理内存问题？的详细内容。更多信息请关注PHP中文网其他相关文章！

Python for Filter using function this issue

声明：

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

上一篇：How do you choose between time-dependent and frame-dependent animation when creating animated sprites in Pygame?下一篇：How to Find the Closest Number in a List of Integers?

查看更多