Home >Backend Development >Python Tutorial >How to Efficiently Split a String Column in a Pandas DataFrame into Two New Columns?
TL;DR version:
For the simple case of having a text column with a delimiter and wanting to create two columns, the simplest solution is:
df[['A', 'B']] = df['AB'].str.split(' ', n=1, expand=True)
In detail:
Andy Hayden's approach effectively demonstrates the power of the str.extract() method. However, for a simple split over a known separator, the .str.split() method is sufficient. It operates on a column (Series) of strings and returns a column (Series) of lists.
The .str attribute of a column allows us to treat each element in a column as a string and apply methods efficiently. It has an indexing interface for getting each element of a string by its index, enabling us to slice and dice lists returned from .str.split().
Python tuple unpacking can be used to create two separate columns from the list using:
df['A'], df['B'] = df['AB'].str.split('-', n=1).str
Alternatively, one can utilize the expand=True parameter in .str.split() to directly generate two columns:
df[['A', 'B']] = df['AB'].str.split('-', n=1, expand=True)
The expand=True version is advantageous when dealing with splits of different lengths, as it handles such cases by inserting None values in the columns with missing "splits".
The above is the detailed content of How to Efficiently Split a String Column in a Pandas DataFrame into Two New Columns?. For more information, please follow other related articles on the PHP Chinese website!