Home >Backend Development >Python Tutorial >How to Efficiently Split a Million-Row DataFrame into Smaller DataFrames by Participant?

How to Efficiently Split a Million-Row DataFrame into Smaller DataFrames by Participant?

Susan Sarandon
Susan SarandonOriginal
2024-12-03 01:31:11383browse

How to Efficiently Split a Million-Row DataFrame into Smaller DataFrames by Participant?

Splitting DataFrame into Multiple DataFrames

When dealing with massive datasets, it can be necessary to split them into smaller chunks for efficient processing. This can be achieved by dividing the DataFrame based on a unique identifier, resulting in multiple smaller DataFrames. In this case, the goal is to partition a 1 million-row DataFrame into 60 smaller DataFrames, one for each participant identified by the 'name' variable.

Unfortunately, the provided Python code for splitting the DataFrame fails to complete the task. Instead of running indefinitely, an alternative approach is recommended utilizing the slicing and indexing capabilities of Pandas. Here's the modified code:

import pandas as pd

# Create a list of unique participant names
unique_names = data['name'].unique()

# Create a dictionary to store the DataFrames for each participant
participant_data = {name: pd.DataFrame() for name in unique_names}

# Populate the dictionary with sliced DataFrames for each participant
for name in unique_names:
    participant_data[name] = data[data['name'] == name]

This code efficiently slices the DataFrame based on the 'name' column, creating separate DataFrames for each participant while avoiding the pitfalls of the previous code.

The above is the detailed content of How to Efficiently Split a Million-Row DataFrame into Smaller DataFrames by Participant?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn