Home >Backend Development >Python Tutorial >How to Melt a Pandas DataFrame and When to Use This Technique?

How to Melt a Pandas DataFrame and When to Use This Technique?

Barbara Streisand
Barbara StreisandOriginal
2024-12-29 00:52:11454browse

How to Melt a Pandas DataFrame and When to Use This Technique?

Melting Pandas DataFrames

What is Melt?

Melting a pandas DataFrame involves restructuring it from a wide format, where each column represents a variable, to a long format, where each row represents an observation and each column represents a feature-value pair.

How to Melt a DataFrame

To melt a DataFrame, use the pd.melt() function, specifying the following arguments:

  • id_vars: Columns to be kept as unique identifiers (typically the primary key or index).
  • value_vars: Columns to be melted (converted to rows). If not specified, all columns not in id_vars are melted.
  • var_name: Name of the column that will contain the original column names.
  • value_name: Name of the column that will contain the original column values.

For example, to melt the following DataFrame:

import pandas as pd

df = pd.DataFrame({'Name': ['Bob', 'John', 'Foo', 'Bar', 'Alex', 'Tom'],
                   'Math': ['A+', 'B', 'A', 'F', 'D', 'C'],
                   'English': ['C', 'B', 'B', 'A+', 'F', 'A']})

we can use:

df_melted = pd.melt(df, id_vars=['Name'], value_vars=['Math', 'English'])

This will output the melted DataFrame:

   Name  variable  value
0   Bob    Math     A+
1   John    Math      B
2   Foo    Math      A
3   Bar    Math      F
4   Alex    Math      D
5   Tom    Math      C
6   Bob  English      C
7   John  English      B
8   Foo   English      B
9   Bar  English     A+
10  Alex  English      F
11  Tom   English      A

When to Use Melt

Melting is useful when you need to:

  • Transform wide data into a format suitable for plotting or visualization.
  • Prepare data for machine learning models that require specific data formats.
  • Group observations by their unique identifiers and perform aggregations or transformations on the melted data.

Example Scenarios

Problem 1: Convert the DataFrame below into a melted format, with columns Name, Age, Subject, and Grade.

df = pd.DataFrame({'Name': ['Bob', 'John', 'Foo', 'Bar', 'Alex', 'Tom'],
                   'Math': ['A+', 'B', 'A', 'F', 'D', 'C'],
                   'English': ['C', 'B', 'B', 'A+', 'F', 'A']})
df_melted = pd.melt(df, id_vars=['Name', 'Age'], var_name='Subject', value_name='Grade')

print(df_melted)

Output:

   Name  Age Subject Grade
0   Bob   13  English      C
1  John   16  English      B
2   Foo   16  English      B
3   Bar   15  English     A+
4  Alex   17  English      F
5   Tom   12  English      A
6   Bob   13     Math     A+
7  John   16     Math      B
8   Foo   16     Math      A
9   Bar   15     Math      F
10 Alex   17     Math      D
11  Tom   12     Math      C

Problem 2: Filter the melted DataFrame from Problem 1 to include only Math columns.

df_melted_math = pd.melt(df, id_vars=['Name', 'Age'], value_vars=['Math'], var_name='Subject', value_name='Grade')

print(df_melted_math)

Output:

   Name  Age Subject Grade
0   Bob   13    Math     A+
1  John   16    Math      B
2   Foo   16    Math      A
3   Bar   15    Math      F
4  Alex   17    Math      D
5   Tom   12    Math      C

Problem 3: Group the melted DataFrame by Grade and calculate the unique names and subjects for each Grade.

df_melted_grouped = df_melted.groupby(['Grade']).agg({'Name': ', '.join, 'Subject': ', '.join}).reset_index()

print(df_melted_grouped)

Output:

  Grade             Name                Subjects
0     A       Foo, Tom           Math, English
1    A+         Bob, Bar           Math, English
2     B  John, John, Foo  Math, English, English
3     C         Bob, Tom           English, Math
4     D             Alex                    Math
5     F        Bar, Alex           Math, English

Problem 4: Unmelt the melted DataFrame from Problem 1 back to its original format.

df_unmelted = df_melted.pivot_table(index=['Name', 'Age'], columns='Subject', values='Grade', aggfunc='first').reset_index()

print(df_unmelted)

Output:

   Name  Age English Math
0   Alex   17       F    D
1   Bar   15      A+    F
2   Bob   13       C   A+
3   Foo   16       B    A
4  John   16       B    B
5   Tom   12       A    C

Problem 5: Group the melted DataFrame from Problem 1 by Name and separate the subjects and grades by commas.

df_melted_by_name = df_melted.groupby('Name').agg({'Subject': ', '.join, 'Grade': ', '.join}).reset_index()

print(df_melted_by_name)

Output:

   Name        Subject Grades
0  Alex  Math, English   D, F
1   Bar  Math, English  F, A+
2   Bob  Math, English  A+, C
3   Foo  Math, English   A, B
4  John  Math, English   B, B
5   Tom  Math, English   C, A

Problem 6: Melt the entire DataFrame into a single column of values, with another column containing the original column names.

df_melted_full = df.melt(ignore_index=False)

print(df_melted_full)

Output:

   Name  Age  variable  value
0   Bob   13    Math     A+
1  John   16    Math      B
2   Foo   16    Math      A
3   Bar   15    Math      F
4  Alex   17    Math      D
5   Tom   12    Math      C
6   Bob   13  English      C
7  John   16  English      B
8   Foo   16  English      B
9   Bar   15  English     A+
10 Alex   17  English      F
11  Tom   12  English      A

The above is the detailed content of How to Melt a Pandas DataFrame and When to Use This Technique?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn