Home >Backend Development >Python Tutorial >How do I split a large Pandas DataFrame into equal parts when the number of rows is not divisible by the number of parts?
Splitting Large Pandas Dataframes into Equal Parts
When working with large datasets in Pandas, it is often necessary to divide them into smaller chunks for processing or analysis. One commonly used method for splitting dataframes is np.split, which distributes the data into an equal number of arrays along a specified axis. However, attempting to split an uneven number of rows using this method can result in a ValueError.
Alternative Approach Using np.array_split
To overcome this issue, consider using np.array_split instead. This function allows for unequal division of the dataframe, as demonstrated in the following Python code:
<code class="python">import pandas as pd import numpy as np df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : np.random.randn(8), 'D' : np.random.randn(8)}) print(df) split_data = np.array_split(df, 4) for part in split_data: print(part)</code>
The output of this code shows the dataframe split into four equal parts:
A B C D 0 foo one -0.174067 -0.608579 1 bar one -0.860386 -1.210518 2 foo two 0.614102 1.689837 3 bar three -0.284792 -1.071160 4 foo two 0.843610 0.803712 5 bar two -1.514722 0.870861 6 foo one 0.131529 -0.968151 7 foo three -1.002946 -0.257468 A B C D 0 foo one -0.174067 -0.608579 1 bar one -0.860386 -1.210518 2 foo two 0.614102 1.689837 3 bar three -0.284792 -1.071160 4 foo two 0.843610 0.803712 5 bar two -1.514722 0.870861 A B C D 0 foo one 0.131529 -0.968151 1 foo three -1.002946 -0.257468 A B C D 0 bar one -0.860386 -1.210518 1 foo two 0.614102 1.689837 2 bar three -0.284792 -1.071160 3 foo two 0.843610 0.803712 4 bar two -1.514722 0.870861
Using np.array_split ensures an even distribution of the dataframe rows, regardless of their total count. This provides a convenient method for splitting large datasets into manageable chunks for further processing.
The above is the detailed content of How do I split a large Pandas DataFrame into equal parts when the number of rows is not divisible by the number of parts?. For more information, please follow other related articles on the PHP Chinese website!