Home  >  Article  >  Backend Development  >  How do I split a large Pandas DataFrame into equal parts when the number of rows is not divisible by the number of parts?

How do I split a large Pandas DataFrame into equal parts when the number of rows is not divisible by the number of parts?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-10-28 03:29:30846browse

How do I split a large Pandas DataFrame into equal parts when the number of rows is not divisible by the number of parts?

Splitting Large Pandas Dataframes into Equal Parts

When working with large datasets in Pandas, it is often necessary to divide them into smaller chunks for processing or analysis. One commonly used method for splitting dataframes is np.split, which distributes the data into an equal number of arrays along a specified axis. However, attempting to split an uneven number of rows using this method can result in a ValueError.

Alternative Approach Using np.array_split

To overcome this issue, consider using np.array_split instead. This function allows for unequal division of the dataframe, as demonstrated in the following Python code:

<code class="python">import pandas as pd
import numpy as np

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                    'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                    'C' : np.random.randn(8), 'D' : np.random.randn(8)})

print(df)

split_data = np.array_split(df, 4)

for part in split_data:
    print(part)</code>

The output of this code shows the dataframe split into four equal parts:

     A      B         C         D
0  foo    one -0.174067 -0.608579
1  bar    one -0.860386 -1.210518
2  foo    two  0.614102  1.689837
3  bar  three -0.284792 -1.071160
4  foo    two  0.843610  0.803712
5  bar    two -1.514722  0.870861
6  foo    one  0.131529 -0.968151
7  foo  three -1.002946 -0.257468

     A      B         C         D
0  foo    one -0.174067 -0.608579
1  bar    one -0.860386 -1.210518
2  foo    two  0.614102  1.689837
3  bar  three -0.284792 -1.071160
4  foo    two  0.843610  0.803712
5  bar    two -1.514722  0.870861

     A      B         C         D
0  foo    one  0.131529 -0.968151
1  foo  three -1.002946 -0.257468

     A      B         C         D
0  bar    one -0.860386 -1.210518
1  foo    two  0.614102  1.689837
2  bar  three -0.284792 -1.071160
3  foo    two  0.843610  0.803712
4  bar    two -1.514722  0.870861

Using np.array_split ensures an even distribution of the dataframe rows, regardless of their total count. This provides a convenient method for splitting large datasets into manageable chunks for further processing.

The above is the detailed content of How do I split a large Pandas DataFrame into equal parts when the number of rows is not divisible by the number of parts?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn