Home > Article > Backend Development > pandas method to get the row with the maximum value in the groupby group
The following is a pandas method for getting the row with the maximum value in the groupby group. It has a good reference value and I hope it will be helpful to everyone. Let’s take a look together
pandas method of getting the row with the maximum value in the groupby group
For example, in the following DataFrame, group by Mt and take out The row with the largest Count
import pandas as pd df = pd.DataFrame({'Sp':['a','b','c','d','e','f'], 'Mt':['s1', 's1', 's2','s2','s2','s3'], 'Value':[1,2,3,4,5,6], 'Count':[3,2,5,10,10,6]}) df
Mt | Sp | Value | ||
---|---|---|---|---|
3 | s1 | a | 1 | |
2 | s1 | b | 2 | ##2 |
s2 | c | 3 | 3 | |
s2 | d | 4 | 4 | |
s2 | #e | 5 | 5 | |
s3 | f | 6 |
df.groupby('Mt').apply(lambda t: t[t.Count==t.Count.max()])
##Count |
Mt | SpValue | Mt | ||
---|---|---|---|---|---|
#s1 |
|||||
s1 | a | 1 | s2 | 3 | |
s2 | d | 4 | 4 | 10 | |
e | 5 | s3 | 5 | ||
s3 | f | 6 | Method 2: Use transform to get the index of the original dataframe, and then filter out the required rows |
print df.groupby(['Mt'])['Count'].agg(max)
idx=df.groupby(['Mt'])['Count'].transform(max)
print idx
idx1 = idx == df['Count']
print idx1
df[idx1]
Mt s1 3 s2 10 s3 6 Name: Count, dtype: int64 0 3 1 3 2 10 3 10 4 10 5 6 dtype: int64 0 True 1 False 2 False 3 True 4 True 5 True dtype: bool
Sp | Value##0 | 3 | ||
---|---|---|---|---|
1 | 3 | 10 | s2 | |
4 | 4 | 10 | s2 | |
5 | 5 | 6 | s3 | |
6 | The above method has a problem with the values in rows 3 and 4. They are all maximum values, so multiple rows are returned. What if only one row is returned? | Method 3: idmax (the old version of pandas is argmax) |
idx = df.groupby('Mt')['Count'].idxmax() print idx
df.iloc[idx]
Mt
s1 0
s2 3
s3 5
Name: Count, dtype: int64
#Count
Value |
0 | 3 | s1 | |
---|---|---|---|---|
3 | 10 | s2 | d | |
5 | 6 | s3 | f | |
Mt
0 | 3 | s1 | ||
---|---|---|---|---|
3 | 10 | s2 | d | |
5 | 6 | s3 | f | |
def using_apply(df): return (df.groupby('Mt').apply(lambda subf: subf['Value'][subf['Count'].idxmax()])) def using_idxmax_loc(df): idx = df.groupby('Mt')['Count'].idxmax() return df.loc[idx, ['Mt', 'Value']] print using_apply(df) using_idxmax_loc(df) |
Mt s1 1 s2 4 s3 6 dtype: int64 |
##Mt
#Value
0 | s11 | |
---|---|---|
4 | ##5 | |
6 | ||
from each group | df.sort('Count', ascending=False).groupby('Mt', as_index=False).first() |
##Mt
Count
Sp
s1 | 3 | a | 1 | |
---|---|---|---|---|
s2 | 10 | d | 4 | |
s3 | 6 | f | 6 | |
Then the problem comes again. What if you don’t want to extract the row with the maximum value, for example, the row with the middle value? | The idea is still similar, but some modifications may be needed in the specific writing method. For example, methods 1 and 2 need to modify the max algorithm, and method 3 needs to implement a method that returns index. Anyway, after groupby, each group is a dataframe. | Related recommendations: | pandas dataframe implements row and column selection and slicing operations |
The above is the detailed content of pandas method to get the row with the maximum value in the groupby group. For more information, please follow other related articles on the PHP Chinese website!