Home  >  Article  >  Backend Development  >  How to melt pandas dataframe?

How to melt pandas dataframe?

WBOY
WBOYforward
2024-02-09 23:30:14392browse

如何融化 pandas 数据框?

Question content

On the pandas tag, I often see users asking questions about melting dataframes in pandas. I'll try to do a canonical Q&A (self-answer) on this topic.

I want to clarify:

  1. What is melting?

  2. How to use melt?

  3. When to use melt?

I saw some popular questions about melting, such as:

  • Convert columns to rows using pandas: This one might actually be good, but more explanation would be nice.

  • pandas melt function: A good question with a good answer, but a bit too vague without much explanation.

  • Melting pandas dataframe: Also a great answer! But that's just for a specific case, it's simple, just pd.melt(df)

  • pandas dataframe using columns as rows (melted) : Very neat! But the problem is, it only addresses the specific question asked by the OP, which also requires using pivot_table.

So I will try to do a canonical Q&A on this topic.

data set:

I will find all the answers in this dataset of random grades for random people at random ages (easier to explain answer :d):

import pandas as pd
df = pd.dataframe({'name': ['bob', 'john', 'foo', 'bar', 'alex', 'tom'],
                   'math': ['a+', 'b', 'a', 'f', 'd', 'c'],
                   'english': ['c', 'b', 'b', 'a+', 'f', 'a'],
                   'age': [13, 16, 16, 15, 15, 13]})
>>> df
   name math english  age
0   bob   a+       c   13
1  john    b       b   16
2   foo    a       b   16
3   bar    f      a+   15
4  alex    d       f   15
5   tom    c       a   13

question:

Question 1:

How to melt the data frame so that the original data frame becomes the following?

    name  age  subject grade
0    bob   13  english     c
1   john   16  english     b
2    foo   16  english     b
3    bar   15  english    a+
4   alex   17  english     f
5    tom   12  english     a
6    bob   13     math    a+
7   john   16     math     b
8    foo   16     math     a
9    bar   15     math     f
10  alex   17     math     d
11   tom   12     math     c

I want to transpose it so that one column is for each subject and the other columns are the repeated names of students along with their ages and scores.

Question 2:

This is similar to question 1, but this time I want question 1 to output the subject column with only math, and I want to filter out the english column:

   name  age subject grades
0   bob   13    math     a+
1  john   16    math      b
2   foo   16    math      a
3   bar   15    math      f
4  alex   15    math      d
5   tom   13    math      c

I want the output to look like above.

Question 3:

If I were to group the melts and sort them by students' scores, how would I do it to get the desired output like this:

  value             name                subjects
0     a         foo, tom           math, english
1    a+         bob, bar           math, english
2     b  john, john, foo  math, english, english
3     c         tom, bob           math, english
4     d             alex                    math
5     f        bar, alex           math, english

I need to sort it with names separated by commas and subjects separated by commas in the same order respectively.

Question 4:

How do I unfreeze a molten dataframe? Let's say I've melted this dataframe:

df = df.melt(id_vars=['name', 'age'], var_name='subject', value_name='grades')

become:

    name  age  subject grades
0    bob   13     math     a+
1   john   16     math      b
2    foo   16     math      a
3    bar   15     math      f
4   alex   15     math      d
5    tom   13     math      c
6    bob   13  english      c
7   john   16  english      b
8    foo   16  english      b
9    bar   15  english     a+
10  alex   15  english      f
11   tom   13  english      a

So how do I convert it back to the original data frame like below?

   name math english  age
0   bob   a+       c   13
1  john    b       b   16
2   foo    a       b   16
3   bar    f      a+   15
4  alex    d       f   15
5   tom    c       a   13

Question 5:

What would I do if I wanted to group by student name and separate subjects and grades with commas?

   name        subject grades
0  alex  math, english   d, f
1   bar  math, english  f, a+
2   bob  math, english  a+, c
3   foo  math, english   a, b
4  john  math, english   b, b
5   tom  math, english   c, a

I want a dataframe like the one above.

Question 6:

If I were to completely melt my dataframe, with all columns as values, what would I do?

     Column Value
0      Name   Bob
1      Name  John
2      Name   Foo
3      Name   Bar
4      Name  Alex
5      Name   Tom
6      Math    A+
7      Math     B
8      Math     A
9      Math     F
10     Math     D
11     Math     C
12  English     C
13  English     B
14  English     B
15  English    A+
16  English     F
17  English     A
18      Age    13
19      Age    16
20      Age    16
21      Age    15
22      Age    15
23      Age    13

I want a dataframe like the one above. All columns as values.


Correct answer


##Notes for pandas version < 0.20.0: I will use df.melt(... ) as my example, but you need to use pd.melt(df, .. .) instead.

Document reference:

Most of the solutions here will work with

melt, so to know how to melt, please refer to the documentation illustrate.

Melting logic:

melting merges multiple columns, converts the dataframe from wide to long, and solves problem 1 (see below). The steps are:

  1. First we get the original data frame.

  2. Then, melt first merges the

    math and english columns and makes the dataframe copy (longer).

  3. Finally it adds the

    subject column, which is the subject of the grades column value:

This is the simple logic of the

melt function.

solution:

Question 1:

Problem 1 can be solved using

pd.dataframe.melt Use the following code:

print(df.melt(id_vars=['name', 'age'], var_name='subject', value_name='grades'))

This code passes the

id_vars parameter to ['name', 'age'] and then automatically sets the value_vars to the other columns ( ['math', 'english']), which is the transposed conversion to this format.

You can also use

stack like this: 解决问题 1 >

print(
    df.set_index(["name", "age"])
    .stack()
    .reset_index(name="grade")
    .rename(columns={"level_2": "subject"})
    .sort_values("subject")
    .reset_index(drop=true)
)

This code sets the

name and age columns as indexes, stacks the remaining columns math and english, and resets the index and specify grade as the column name, then rename the other columns to level_2phpcnendcphp cn to subject<code> and then press the subject column and finally reset the index again.

Output of these two solutions:

    name  age  subject grade
0    bob   13  english     c
1   john   16  english     b
2    foo   16  english     b
3    bar   15  english    a+
4   alex   17  english     f
5    tom   12  english     a
6    bob   13     math    a+
7   john   16     math     b
8    foo   16     math     a
9    bar   15     math     f
10  alex   17     math     d
11   tom   12     math     c

问题 2:

这和我的第一个问题类似,但是这个我只在 math 列中进行过滤,这时候 value_vars 参数就可以派上用场了,如下所示:

print(
    df.melt(
        id_vars=["name", "age"],
        value_vars="math",
        var_name="subject",
        value_name="grades",
    )
)

或者我们也可以使用 stack 与列规格:

print(
    df.set_index(["name", "age"])[["math"]]
    .stack()
    .reset_index(name="grade")
    .rename(columns={"level_2": "subject"})
    .sort_values("subject")
    .reset_index(drop=true)
)

这两种解决方案都给出:

   name  age subject grade
0   bob   13    math    a+
1  john   16    math     b
2   foo   16    math     a
3   bar   15    math     f
4  alex   15    math     d
5   tom   13    math     c

问题 3:

问题3可以通过melt解决和 groupby,使用 agg 函数和 ' , '.join,如下所示:

print(
    df.melt(id_vars=["name", "age"])
    .groupby("value", as_index=false)
    .agg(", ".join)
)

它会融合数据框,然后按等级进行分组,聚合它们并用逗号将它们连接起来。

stack也可以用来解决这个问题,与 stackgroupby 如下所示:

print(
    df.set_index(["name", "age"])
    .stack()
    .reset_index()
    .rename(columns={"level_2": "subjects", 0: "grade"})
    .groupby("grade", as_index=false)
    .agg(", ".join)
)

这个 stack 函数只是转置数据帧以相当于 melt 的方式,然后重置索引,重命名列、组和聚合。

两种解决方案输出:

  grade             name                subjects
0     a         foo, tom           math, english
1    a+         bob, bar           math, english
2     b  john, john, foo  math, english, english
3     c         bob, tom           english, math
4     d             alex                    math
5     f        bar, alex           math, english

问题 4:

这可以通过 pivot_table 来解决。我们必须指定参数 valuesindexcolumns 以及 aggfunc

我们可以用下面的代码来解决这个问题:

print(
    df.pivot_table("grades", ["name", "age"], "subject", aggfunc="first")
    .reset_index()
    .rename_axis(columns=none)
)

输出:

   name  age english math
0  alex   15       f    d
1   bar   15      a+    f
2   bob   13       c   a+
3   foo   16       b    a
4  john   16       b    b
5   tom   13       a    c

融化的数据帧被转换回与原始数据帧完全相同的格式。

我们首先旋转融化的数据框,然后重置索引并删除列轴名称。

问题 5:

问题5可以通过melt解决和 groupby 如下所示:

print(
    df.melt(id_vars=["name", "age"], var_name="subject", value_name="grades")
    .groupby("name", as_index=false)
    .agg(", ".join)
)

融化并按 name 分组。

或者您可以stack: p>

print(
    df.set_index(["name", "age"])
    .stack()
    .reset_index()
    .groupby("name", as_index=false)
    .agg(", ".join)
    .rename({"level_2": "subjects", 0: "grades"}, axis=1)
)

两个代码输出:

   name       subjects grades
0  alex  math, english   d, f
1   bar  math, english  f, a+
2   bob  math, english  a+, c
3   foo  math, english   a, b
4  john  math, english   b, b
5   tom  math, english   c, a

问题 6:

问题6可以通过melt解决并且不需要指定列,只需指定预期的列名称:

print(df.melt(var_name='column', value_name='value'))

这会融化整个数据框。

或者您可以stack: p>

print(
    df.stack()
    .reset_index(level=1)
    .sort_values("level_1")
    .reset_index(drop=true)
    .set_axis(["column", "value"], axis=1)
)

两个代码输出:

     Column Value
0       Age    16
1       Age    15
2       Age    15
3       Age    16
4       Age    13
5       Age    13
6   English    A+
7   English     B
8   English     B
9   English     A
10  English     F
11  English     C
12     Math     C
13     Math    A+
14     Math     D
15     Math     B
16     Math     F
17     Math     A
18     Name  Alex
19     Name   Bar
20     Name   Tom
21     Name   Foo
22     Name  John
23     Name   Bob

The above is the detailed content of How to melt pandas dataframe?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete