Home  >  Article  >  Backend Development  >  How to Efficiently Process Large DataFrames in Pandas: Chunk It Up!

How to Efficiently Process Large DataFrames in Pandas: Chunk It Up!

Susan Sarandon
Susan SarandonOriginal
2024-10-27 07:57:03498browse

How to Efficiently Process Large DataFrames in Pandas: Chunk It Up!

Pandas - Slicing Large Dataframes into Chunks

When attempting to process oversized dataframes, a common obstacle is the dreaded Memory Error. One effective solution is to divide the dataframe into smaller, manageable chunks. This strategy not only reduces memory consumption but also facilitates efficient processing.

To achieve this, we can leverage either list comprehension or the NumPy array_split function.

List Comprehension

<code class="python">n = 200000  # Chunk row size
list_df = [df[i:i+n] for i in range(0, df.shape[0], n)]</code>

NumPy array_split

<code class="python">list_df = np.array_split(df, math.ceil(len(df) / n))</code>

Individual chunks can then be retrieved using:

<code class="python">list_df[0]
list_df[1]
...</code>

To reassemble the chunks into a single dataframe, employ pd.concat:

<code class="python"># Example: Concatenating by chunks
rejoined_df = pd.concat(list_df)</code>

Slicing by AcctName

To split the dataframe by AcctName values, utilize the groupby method:

<code class="python">list_df = []

for n, g in df.groupby('AcctName'):
    list_df.append(g)</code>

The above is the detailed content of How to Efficiently Process Large DataFrames in Pandas: Chunk It Up!. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn