Home >Backend Development >Python Tutorial >How Can I Efficiently Split a Large Pandas DataFrame into Smaller DataFrames Based on Participant IDs?
Problem:
You have a massive dataframe with over 1 million records representing data from an experiment with 60 participants. Each participant has a unique code stored in the 'name' variable of the dataframe. You aim to divide the dataframe into 60 smaller dataframes, one for each participant.
Original Attempt:
Your initial approach to achieve this through a custom function called splitframe didn't yield results within an hour of execution. The function intended to loop through the dataframe, iteratively appending rows to smaller dataframes and adding them to a list until a new participant was identified, at which point it would create a new dataframe for the subsequent participant.
Solution using Dataframe Slicing:
Instead of iteratively splitting the dataframe, you can employ a more efficient approach using dataframe slicing. Here's how you can do it:
import pandas as pd # Create a list of unique participant names unique_names = data['name'].unique() # Initialize a dictionary to store the split dataframes data_dict = {} # Iterate over the unique names for name in unique_names: # Create a new dataframe by slicing the original dataframe data_dict[name] = data[data['name'] == name]
Result:
This code will create a dictionary called data_dict. Each key in the dictionary represents a participant name, and the corresponding value is a pandas dataframe containing all the data for that particular participant. You can access each participant's dataframe by using the following syntax:
participant_data = data_dict['ParticipantName']
The above is the detailed content of How Can I Efficiently Split a Large Pandas DataFrame into Smaller DataFrames Based on Participant IDs?. For more information, please follow other related articles on the PHP Chinese website!