Home >Backend Development >Python Tutorial >How to Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?

How to Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?

Linda Hamilton
Linda HamiltonOriginal
2024-12-25 21:50:14872browse

How to Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?

Splitting Comma-Separated String Entries in a Pandas DataFrame to Create Separate Rows

Problem:
We have a Pandas DataFrame containing strings with comma-separated values in one column. We wish to split each CSV entry and create a new row for each unique value. For instance, "a,b,c" should become "a", "b", "c".

Solution:

Option 1: DataFrame.explode() (Pandas 0.25.0 )

The DataFrame.explode() method is specifically designed for this purpose. It allows us to split a list-like column (in this case, our comma-separated strings) into individual rows.

In [1]: df.explode('var1')
Out[1]:
  var1  var2 var3
0    a     1   XX
1    b     1   XX
2    c     1   XX
3    d     2   ZZ
4    e     2   ZZ
5    f     2   ZZ
6    x     2   ZZ
7    y     2   ZZ

Option 2: Custom Vectorized Function

If DataFrame.explode() is not available or we need more customization, we can create our own vectorized function:

import numpy as np

def explode(df, lst_cols, fill_value='', preserve_index=False):
    # Convert `lst_cols` to a list if it is a string.
    if isinstance(lst_cols, str):
        lst_cols = [lst_cols]

    # Calculate the lengths of each list in `lst_cols`.
    lens = df[lst_cols[0]].str.len()

    # Create a new index based on the lengths of the lists.
    idx = np.repeat(df.index.values, lens)

    # Create a new DataFrame with the exploded columns.
    exp_df = pd.DataFrame({
        col: np.repeat(df[col].values, lens)
        for col in df.columns.difference(lst_cols)
    }, index=idx).assign(**{
        col: np.concatenate(df.loc[lens > 0, col].values)
        for col in lst_cols
    })

    # Append rows with empty lists if necessary.
    if (lens == 0).any():
        exp_df = exp_df.append(df.loc[lens == 0, df.columns.difference(lst_cols)], sort=False).fillna(fill_value)

    # Revert the original index order and reset the index if requested.
    exp_df = exp_df.sort_index()
    if not preserve_index:
        exp_df = exp_df.reset_index(drop=True)

    return exp_df

Example usage:

In [2]: explode(df, 'var1')
Out[2]:
  var1  var2 var3
0    a     1   XX
1    b     1   XX
2    c     1   XX
3    d     2   ZZ
4    e     2   ZZ
5    f     2   ZZ
6    x     2   ZZ
7    y     2   ZZ

The above is the detailed content of How to Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn