Home  >  Article  >  Backend Development  >  How can I split a comma-separated cell into multiple rows in a Pandas DataFrame?

How can I split a comma-separated cell into multiple rows in a Pandas DataFrame?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-11-03 05:05:03619browse

How can I split a comma-separated cell into multiple rows in a Pandas DataFrame?

Splitting a Cell into Multiple Rows in a Pandas Dataframe

Pandas offers comprehensive tools for data manipulation, including the ability to split a cell that contains multiple comma-separated values into multiple rows. In this guide, we will explore methods to achieve this using two different approaches based on pandas' version.

pandas >= 0.25

For pandas versions 0.25 and above, you can use a combination of apply, str.split, and Series.explode to achieve the desired result. Here's the code snippet:

<code class="python">(df.set_index(['order_id', 'order_date'])
   .apply(lambda x: x.str.split(',').explode())
   .reset_index())                                                   </code>

Explanation:

  1. set_index(['order_id', 'order_date']): Sets the order_id and order_date columns as the index to preserve them during subsequent operations.
  2. apply(lambda x: x.str.split(',').explode()): Applies a lambda function to each row. It splits the cell values (package and package_code) on the comma delimiter and explodes the resulting lists into multiple rows.
  3. reset_index(): Resets the index to create a new DataFrame with the exploded values as separate rows.

pandas <= 0.24

For pandas versions 0.24 and below, a more complex approach involving stack, unstack, and str.split is necessary:

<code class="python">(df.set_index(['order_date', 'order_id'])
   .stack()
   .str.split(',', expand=True)
   .stack()
   .unstack(-2)
   .reset_index(-1, drop=True)
   .reset_index()
)</code>

Explanation:

  1. Similar to the previous approach, set_index sets order_date and order_id as the index.
  2. stack() collapses the rows and stacks them as a single column.
  3. str.split(',', expand=True) splits the combined values into multiple columns based on the comma delimiter.
  4. stack() stacks the columns to create a single column again.
  5. unstack(-2) unstacks the DataFrame at the second-last level to create rows containing the split values.
  6. reset_index(-1, drop=True) removes the extra level of the index.
  7. reset_index() adds a new index to create a new DataFrame.

Both methods will return a new DataFrame with the exploded values as separate rows, as illustrated in the desired output you provided.

The above is the detailed content of How can I split a comma-separated cell into multiple rows in a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn