Home >Backend Development >Python Tutorial >How to Efficiently Split Large Pandas DataFrames into Non-Equal Sections?

How to Efficiently Split Large Pandas DataFrames into Non-Equal Sections?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-10-26 22:24:29330browse

How to Efficiently Split Large Pandas DataFrames into Non-Equal Sections?

Splitting Large Pandas DataFrames

When working with large datasets in Pandas, it is often necessary to split the dataframe into smaller chunks for processing or distribution. However, using np.split directly can result in an error if the array cannot be divided equally.

Using np.array_split

The np.array_split function provides a more flexible approach for splitting arrays, including dataframes, into sections. Unlike np.split, it allows the number of sections to be an integer that does not evenly divide the axis.

Consider the following example with a dataframe containing 423244 rows, which we wish to split into 4 groups:

<code class="python">In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame({
    'A': ['foo', 'bar', 'foo', 'bar'],
    'B': ['one', 'one', 'two', 'three'],
    'C': np.array([rand() for i in range(4)]),
    'D': np.array([rand() for i in range(4)])
})

In [3]:
print(df)</code>

To split the dataframe into 4 groups using np.array_split, we can:

<code class="python">In [4]:
import numpy as np

In [5]:
sections = np.array_split(df, 4)</code>

The sections variable now contains a list of 4 dataframes, each containing approximately 105811 rows.

When dealing with large dataframes, it is important to consider the computational cost and memory requirements of different splitting methods. np.array_split provides a versatile and efficient solution for dividing arrays into non-equal sections.

The above is the detailed content of How to Efficiently Split Large Pandas DataFrames into Non-Equal Sections?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn