Home > Article > Backend Development > How to Keep Rows with Maximum Values in Column B for Duplicate Values in Column A?
Finding Rows with Maximum Column B Values for Duplicate Column A Values
In data analysis, it is often necessary to remove duplicate records while retaining unique data. A common scenario involves a dataset with duplicate values in a particular column (column A), where the goal is to keep the row with the highest value in another column (column B).
To achieve this, the first solution utilizes the drop_duplicates() function with the keep="last" parameter. This drops duplicate rows based on column A while keeping the last-seen row, regardless of the value in column B.
However, if the objective is to keep the row with the maximum value in column B, the above solution is not suitable. Instead, a combination of groupby() and apply(), similar to the second solution provided, can be used. This approach groups rows by column A, applies a function to each group, and selects the row with the maximum value in column B within each group.
Implementation:
import pandas as pd # Create data frame with duplicate values in column A df = pd.DataFrame([[1, 10], [1, 20], [2, 30], [2, 40], [3, 10]], columns=['A', 'B']) # Keep row with maximum value in column B for each duplicate in column A max_b_rows = df.groupby('A', group_keys=False).apply(lambda x: x.loc[x.B.idxmax()]) # Display resulting data frame print(max_b_rows)
Output:
A B A 1 1 20 2 2 40 3 3 10
The above is the detailed content of How to Keep Rows with Maximum Values in Column B for Duplicate Values in Column A?. For more information, please follow other related articles on the PHP Chinese website!