Home >Backend Development >Python Tutorial >How to Apply Multiple Functions to Multiple Columns in Pandas GroupBy?

How to Apply Multiple Functions to Multiple Columns in Pandas GroupBy?

Barbara Streisand
Barbara StreisandOriginal
2024-12-08 05:53:10522browse

How to Apply Multiple Functions to Multiple Columns in Pandas GroupBy?

How to Apply Multiple Functions to Multiple Grouped Columns

Groupby operations in Pandas allow for the aggregation of data based on specific columns or keys. However, when working with complex datasets, it may be necessary to perform multiple operations on different columns within the grouped data.

Using a Dictionary for Series Group-bys

For a Series groupby object, you can use a dictionary to specify multiple functions and output column names, as shown below:

grouped['D'].agg({'result1' : np.sum,
   .....:                   'result2' : np.mean})

This approach, however, does not work for DataFrame groupby objects, as it expects the dictionary keys to represent column names for applying functions.

Custom Functions with Apply

To address this limitation, you can leverage the apply method, which implicitly passes a DataFrame to the applied function. By defining a custom function and returning a Series or MultiIndex Series, you can perform multiple operations on multiple columns within each group:

Returning a Series:

def f(x):
    d = {}
    d['a_sum'] = x['a'].sum()
    d['a_max'] = x['a'].max()
    d['b_mean'] = x['b'].mean()
    d['c_d_prodsum'] = (x['c'] * x['d']).sum()
    return pd.Series(d, index=['a_sum', 'a_max', 'b_mean', 'c_d_prodsum'])

df.groupby('group').apply(f)

Returning a Series with MultiIndex:

def f_mi(x):
        d = []
        d.append(x['a'].sum())
        d.append(x['a'].max())
        d.append(x['b'].mean())
        d.append((x['c'] * x['d']).sum())
        return pd.Series(d, index=[['a', 'a', 'b', 'c_d'], 
                                   ['sum', 'max', 'mean', 'prodsum']])

df.groupby('group').apply(f_mi)

This approach provides a flexible way to perform complex aggregations on grouped data, allowing for multiple operations on multiple columns within each group.

The above is the detailed content of How to Apply Multiple Functions to Multiple Columns in Pandas GroupBy?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn