Home >Backend Development >Python Tutorial >How to Impute Missing Values in Pandas Using Group Means?

How to Impute Missing Values in Pandas Using Group Means?

Susan Sarandon
Susan SarandonOriginal
2024-12-05 16:29:10590browse

How to Impute Missing Values in Pandas Using Group Means?

NaN Imputation with Group Mean in Pandas

Filling missing values using the mean within each group is a common task when working with tabular data. Consider the following DataFrame with missing values:

df = pd.DataFrame({'value': [1, np.nan, np.nan, 2, 3, 1, 3, np.nan, 3],
                   'name': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C']})

Our goal is to impute the missing values with the mean of each group based on the 'name' column.

To achieve this, we can utilize the groupby() and transform() functions in Pandas:

grouped = df.groupby('name').mean()
df["value"] = df.groupby("name").transform(lambda x: x.fillna(x.mean()))

The groupby() function creates groups based on the 'name' column, and mean() calculates the mean value for each group. The transform() function applies this mean value to each row within each group and fills in the missing values.

The resulting DataFrame:

print(df)

  name  value
0    A      1
1    A      1
2    B      2
3    B      2
4    B      3
5    B      1
6    C      3
7    C      3
8    C      3

Explanation:

  • The mean() function calculates the mean value within each group, resulting in a new DataFrame with grouped means.
  • The transform() function applies the fillna() method to each group, using the mean value as the fill value. This populates the missing values with the mean specific to each group.

Alternative Solution:

Another approach to group-based missing value imputation is:

impute_cols = ['value']
df[impute_cols] = df[impute_cols].fillna(df.groupby('name')[impute_cols].transform('mean'))

Both methods achieve the same result, but the latter approach provides more flexibility when imputing multiple columns.

The above is the detailed content of How to Impute Missing Values in Pandas Using Group Means?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn