Home >Backend Development >Python Tutorial >How Can I Retrieve Group-Wise Statistics (Count, Mean, Median, Min, Max) Using Pandas GroupBy?

How Can I Retrieve Group-Wise Statistics (Count, Mean, Median, Min, Max) Using Pandas GroupBy?

Patricia Arquette
Patricia ArquetteOriginal
2024-12-21 15:03:14870browse

How Can I Retrieve Group-Wise Statistics (Count, Mean, Median, Min, Max) Using Pandas GroupBy?

Retrieve Group-Wise Statistics Using Pandas GroupBy

Problem

Given a DataFrame df with multiple columns (col1, col2, etc.), you want to calculate group statistics, such as count, mean, median, minimum, and maximum, for each unique combination of values in those columns.

Approach

Pandas provides a comprehensive groupby function that enables group-wise data analysis. It allows you to aggregate and transform data based on specific grouping keys.

Count

To get the count of rows in each group, use the .size() method. It returns a Series containing the row counts for each unique group. For example:

df.groupby(['col1', 'col2']).size()

To convert this Series into a DataFrame, you can use .reset_index(name='counts'):

df.groupby(['col1', 'col2']).size().reset_index(name='counts')

Multiple Statistics

To calculate multiple statistics for each group, use the .agg() method. You can specify the statistics you want to calculate as a dictionary with column names as keys and aggregation functions as values. For instance, to calculate mean, median, and minimum for columns col3 and col4:

df.groupby(['col1', 'col2']).agg({
    'col3': ['mean', 'count'],
    'col4': ['median', 'min', 'count']
})

Combine Statistics

To combine different aggregations into a single DataFrame, you can use the join method. This allows you to merge multiple DataFrames based on common columns. For example, to create a result combining the count, mean, median, and minimum:

counts = df.groupby(['col1', 'col2']).size().to_frame(name='counts')
counts.join(gb.agg({'col3': 'mean'}).rename(columns={'col3': 'col3_mean'})) \
      .join(gb.agg({'col4': 'median'}).rename(columns={'col4': 'col4_median'})) \
      .join(gb.agg({'col4': 'min'}).rename(columns={'col4': 'col4_min'})) \
      .reset_index()

The above is the detailed content of How Can I Retrieve Group-Wise Statistics (Count, Mean, Median, Min, Max) Using Pandas GroupBy?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn