Home  >  Article  >  Backend Development  >  pandas method to get the row with the maximum value in the groupby group

pandas method to get the row with the maximum value in the groupby group

不言
不言Original
2018-04-20 13:38:464079browse

The following is a pandas method for getting the row with the maximum value in the groupby group. It has a good reference value and I hope it will be helpful to everyone. Let’s take a look together

pandas method of getting the row with the maximum value in the groupby group

For example, in the following DataFrame, group by Mt and take out The row with the largest Count

import pandas as pd
df = pd.DataFrame({'Sp':['a','b','c','d','e','f'], 'Mt':['s1', 's1', 's2','s2','s2','s3'], 'Value':[1,2,3,4,5,6], 'Count':[3,2,5,10,10,6]})

df


# #CountMtSpValue##01##25s2c3310s2d4410s2#e556s3f6##Method 1: Filter out the largest Count in the group The rows of

3 s1 a 1
2 s1 b 2

df.groupby('Mt').apply(lambda t: t[t.Count==t.Count.max()])


SpValueMt0310s26Method 2: Use transform to get the index of the original dataframe, and then filter out the required rows

##Count
Mt





#s1
s1 a 1 s2 3
s2 d 4 4 10
e 5 s3 5
s3 f 6

print df.groupby(['Mt'])['Count'].agg(max)

idx=df.groupby(['Mt'])['Count'].transform(max)
print idx
idx1 = idx == df['Count']
print idx1

df[idx1]

Mt
s1 3
s2 10
s3 6
Name: Count, dtype: int64
0 3
1 3
2 10
3 10
4 10
5 6
dtype: int64
0 True
1 False
2 False
3 True
4 True
5 True
dtype: bool


##CountMtValues1adefThe above method has a problem with the values ​​​​in rows 3 and 4. They are all maximum values, so multiple rows are returned. What if only one row is returned? Method 3: idmax (the old version of pandas is argmax)
Sp
##0 3
1 3 10 s2
4 4 10 s2
5 5 6 s3
6

idx = df.groupby('Mt')['Count'].idxmax()
print idx

df.iloc[idx]
Mt
s1 0
s2 3
s3 5
Name: Count, dtype: int64

#Count

MtSp03s1a1310s2d456s3f6
df.iloc[df.groupby(['Mt']).apply(lambda x: x['Count'].idxmax())]
Value

##Count


Mt

SpValuea146

0 3 s1
3 10 s2 d
5 6 s3 f
def using_apply(df):
 return (df.groupby('Mt').apply(lambda subf: subf['Value'][subf['Count'].idxmax()]))

def using_idxmax_loc(df):
 idx = df.groupby('Mt')['Count'].idxmax()
 return df.loc[idx, ['Mt', 'Value']]

print using_apply(df)

using_idxmax_loc(df)
Mt
s1 1
s2 4
s3 6
dtype: int64

##Mt

#Value

s113s24s3Method 4: Sort the order first, then take the first from each group
0
##5
6
df.sort('Count', ascending=False).groupby('Mt', as_index=False).first()

##Mt

Count

Sp

Value0s13a11s210d42s36f6

Then the problem comes again. What if you don’t want to extract the row with the maximum value, for example, the row with the middle value? The idea is still similar, but some modifications may be needed in the specific writing method. For example, methods 1 and 2 need to modify the max algorithm, and method 3 needs to implement a method that returns index. Anyway, after groupby, each group is a dataframe. Related recommendations: pandas dataframe implements row and column selection and slicing operations
Getting started with Python data processing library pandas


The above is the detailed content of pandas method to get the row with the maximum value in the groupby group. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn