Home >Backend Development >Python Tutorial >How to Flatten a Pandas GroupBy MultiIndex DataFrame?
Converting a Pandas GroupBy MultiIndex Output Back to a DataFrame
When performing a groupby operation on a pandas DataFrame with multiple index columns, the resulting object is a DataFrame with a hierarchical index. This can be inconvenient if you want to access the data as individual rows.
Here's a simple example:
df1 = pd.DataFrame({"City": ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"], "Name": ["Alice", "Bob", "Mallory", "Mallory", "Bob", "Mallory"]}) g1 = df1.groupby(["Name", "City"]).count()
The output of g1 is a DataFrame with a hierarchical index:
City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Seattle 1 1
To convert this back to a DataFrame with individual rows, you can use either the add_suffix and reset_index methods:
g1.add_suffix("_Count").reset_index()
This will add a suffix to the index columns and reset the index to create a flat DataFrame:
Name City City_Count Name_Count 0 Alice Seattle 1 1 1 Bob Seattle 2 2 2 Mallory Portland 2 2 3 Mallory Seattle 1 1
Or, you can use the size method and reset_index to count the number of rows in each group and create a new DataFrame:
DataFrame({'count': df1.groupby(["Name", "City"]).size()}).reset_index()
This will create a DataFrame with a single index column:
Name City count 0 Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 3 Mallory Seattle 1
Which approach you use will depend on your specific needs.
The above is the detailed content of How to Flatten a Pandas GroupBy MultiIndex DataFrame?. For more information, please follow other related articles on the PHP Chinese website!