Home >Backend Development >Python Tutorial >How to Efficiently Split a Million-Row DataFrame into Smaller DataFrames by Participant?
When dealing with massive datasets, it can be necessary to split them into smaller chunks for efficient processing. This can be achieved by dividing the DataFrame based on a unique identifier, resulting in multiple smaller DataFrames. In this case, the goal is to partition a 1 million-row DataFrame into 60 smaller DataFrames, one for each participant identified by the 'name' variable.
Unfortunately, the provided Python code for splitting the DataFrame fails to complete the task. Instead of running indefinitely, an alternative approach is recommended utilizing the slicing and indexing capabilities of Pandas. Here's the modified code:
import pandas as pd # Create a list of unique participant names unique_names = data['name'].unique() # Create a dictionary to store the DataFrames for each participant participant_data = {name: pd.DataFrame() for name in unique_names} # Populate the dictionary with sliced DataFrames for each participant for name in unique_names: participant_data[name] = data[data['name'] == name]
This code efficiently slices the DataFrame based on the 'name' column, creating separate DataFrames for each participant while avoiding the pitfalls of the previous code.
The above is the detailed content of How to Efficiently Split a Million-Row DataFrame into Smaller DataFrames by Participant?. For more information, please follow other related articles on the PHP Chinese website!