Home >Backend Development >Python Tutorial >How to Unnest a Pandas DataFrame Column (or Multiple Columns) into Multiple Rows?

How to Unnest a Pandas DataFrame Column (or Multiple Columns) into Multiple Rows?

DDD
DDDOriginal
2024-12-29 00:39:11286browse

How to Unnest a Pandas DataFrame Column (or Multiple Columns) into Multiple Rows?

How to Unnest a Column in a Pandas DataFrame into Multiple Rows

One of the challenges in data manipulation with Pandas is dealing with columns containing lists. When these list-type columns need to be split into separate rows, the process is known as "unnesting" or "exploding."

Pandas Unnesting Methods

Method 1: pandas.DataFrame.explode

For a DataFrame with a single column to be unnested, the pandas.DataFrame.explode function can be used. It takes the column name as an argument.

df.explode('B')  # dataframe with column 'B' containing lists

Method 2: Using Repeat and DataFrame Constructor

This method combines repeat and the DataFrame constructor. It repeats the values in the column based on the length of the lists and then concatenates them.

df = pd.DataFrame({'A': df.A.repeat(df.B.str.len()), 'B': np.concatenate(df.B.values)})

Method 3: Recreate the List

Re-creating the list involves converting the old column into a list of tuples containing the column's value and each element of the list.

pd.DataFrame([[x] + [z] for x, y in df.values for z in y], columns=df.columns)

Method 4: Using Reindex

Reindex creates a new DataFrame with repeated indices for the elements in the list. The column is then assigned the concatenated elements.

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Generalizing to Multiple Columns

For unnesting multiple columns, a custom function can be defined. It takes the DataFrame and a list of column names to explode.

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx
    return df1.join(df.drop(explode, 1), how='left')

Horizontal Unnesting

To unnest horizontally, the add_prefix method can be employed to create a series of new columns.

df.join(pd.DataFrame(df.B.tolist(), index=df.index).add_prefix('B_'))

The above is the detailed content of How to Unnest a Pandas DataFrame Column (or Multiple Columns) into Multiple Rows?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn