Home >Backend Development >Python Tutorial >How to Split Comma-Separated Values in a Pandas DataFrame into Separate Rows?

How to Split Comma-Separated Values in a Pandas DataFrame into Separate Rows?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-12-24 18:06:15272browse

How to Split Comma-Separated Values in a Pandas DataFrame into Separate Rows?

Split (Explode) Pandas Dataframe String Entry to Separate Rows

Problem:
Manipulating a Pandas dataframe containing a column of comma-separated values, the goal is to split each CSV field into individual rows, preserving the original data structure.

Solution:

The recommended solution is to leverage the Pandas Series.explode() or DataFrame.explode() method, introduced in Pandas 0.25.0 and enhanced in Pandas 1.3.0 to support multi-column explode.

To explode a single column, use Series.explode():

df.explode('column_name')

For multiple columns, use DataFrame.explode():

df.explode(['column1', 'column2'])

Example:

df = pd.DataFrame({
    'A': [[0, 1, 2], 'foo', [], [3, 4]],
    'B': 1,
    'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]
})

df.explode('A')

Output:

     A  B          C
0    0  1  [a, b, c]
0    1  1  [a, b, c]
0    2  1  [a, b, c]
1  foo  1        NaN
2  NaN  1         []
3    3  1     [d, e]
3    4  1     [d, e]

For a more generic approach that works for multiple normal and list columns, consider the following function:

def explode(df, lst_cols, fill_value='', preserve_index=False):
    # Ensure `lst_cols` is list-alike
    if lst_cols and not isinstance(lst_cols, (list, tuple, np.ndarray, pd.Series)):
        lst_cols = [lst_cols]

    # Calculate lengths of lists
    lens = df[lst_cols[0]].str.len()

    # Preserve original index values
    idx = np.repeat(df.index.values, lens)

    # Create an "exploded" DataFrame
    res = (pd.DataFrame({
                col:np.repeat(df[col].values, lens)
                for col in df.columns.difference(lst_cols)
            },
            index=idx)
             .assign(**{col:np.concatenate(df.loc[lens>0, col].values)
                            for col in lst_cols}))

    # Append rows with empty lists
    if (lens == 0).any():
        res = (res.append(df.loc[lens==0, df.columns.difference(lst_cols)], sort=False)
                  .fillna(fill_value))

    # Revert to original index order and reset if requested
    res = res.sort_index()
    if not preserve_index:        
        res = res.reset_index(drop=True)

    return res

Example of exploding a CSV-like column:

df = pd.DataFrame({
    'var1': 'a,b,c d,e,f,x,y'.split(),
    'var2': [1, 2]
})

explode(df.assign(var1=df.var1.str.split(',')), 'var1')

Output:

  var1  var2
0    a     1
1    b     1
2    c     1
3    d     2
4    e     2
5    f     2
6    x     2
7    y     2

The above is the detailed content of How to Split Comma-Separated Values in a Pandas DataFrame into Separate Rows?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn