Home >Backend Development >Python Tutorial >Remove duplicates in DF and convert to JSON obj in python

Remove duplicates in DF and convert to JSON obj in python

王林
王林forward
2024-02-22 13:20:03781browse

删除 DF 中的重复项并在 python 中转换为 JSON obj

Question content

I have a df similar to the one below

name         series
=============================
a             a1
b             b1
a             a2
a             a1
b             b2

I need to convert the series into a list which should be assigned to each name like dictionary or json obj like below

{
   "a": ["a1", "a2"],
   "b": ["b1", "b2"]
}

So far I have tried using groupby but it just groups everything into a single dictionary

test = df.groupby("series")[["name"]].apply(lambda x: x)

The above code gives a df-like output

Series
Name
A     0   A1
      2   A2
      3   A1
B     1   B1
      4   B2

Any help is greatly appreciated

Thank you


Correct answer


Firstdrop_duplicates Make sure there is, thengroupby. agg as a list:

out = df.drop_duplicates().groupby('name')['series'].agg(list).to_dict()

Or dial unique:

out = df.groupby('name')['series'].agg(lambda x: x.unique().tolist()).to_dict()

Output: {'a': ['a1', 'a2'], 'b': ['b1', 'b2']}

If you have additional columns, make sure to keep only the columns of interest:

out = (df[['name', 'series']].drop_duplicates()
       .groupby('name')['series'].agg(list).to_dict()
      )

Sort the list:

out = (df.groupby('name')['series']
         .agg(lambda x: sorted(x.unique().tolist())).to_dict()
      )

Example:

# input
  Name Series
0    A     Z1
1    B     B1
2    A     A2
3    A     Z1
4    B     B2

# output
{'A': ['A2', 'Z1'], 'B': ['B1', 'B2']}

The above is the detailed content of Remove duplicates in DF and convert to JSON obj in python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete