Home >Backend Development >Python Tutorial >How to Create a Pandas DataFrame from a Text File with Specific Patterns?

How to Create a Pandas DataFrame from a Text File with Specific Patterns?

Barbara Streisand
Barbara StreisandOriginal
2024-11-02 13:14:02474browse

How to Create a Pandas DataFrame from a Text File with Specific Patterns?

Creating a Pandas DataFrame from a Text File with Specific Patterns

You need to construct a Pandas DataFrame from a text file with the following structure:

Alabama[edit]
Auburn (Auburn University)[1]
Florence (University of North Alabama)
Jacksonville (Jacksonville State University)[2]
Livingston (University of West Alabama)[2]
Montevallo (University of Montevallo)[2]
Troy (Troy University)[2]
Tuscaloosa (University of Alabama, Stillman College, Shelton State)[3][4]
Tuskegee (Tuskegee University)[5]

The rows with "[edit]" indicate states, while the rows with "[number]" indicate regions. The task is to split the file based on these patterns and repeat the state name for each region name.

Solution:

  1. Read the text file using Pandas' read_csv function, specifying the column name as "Region Name" due to no separator.
  2. Create a new column named "State" using String Extraction to capture the state names from the rows with "[edit]" and fill the values forward.
  3. Replace all characters from the opening parenthesis "(" to the end of the string in the "Region Name" column.
  4. Filter out the rows containing "[edit]" using boolean indexing based on a mask created using String Contains.

This process will result in the desired Pandas DataFrame with "State" and "Region Name" columns.

Example:

<code class="python">import pandas as pd

df = pd.read_csv("filename.txt", sep=";", names=['Region Name'])
df.insert(0, 'State', df['Region Name'].str.extract('(.*)\[edit\]', expand=False).ffill())
df['Region Name'] = df['Region Name'].str.replace(r' \(.+$', '')
df = df[~df['Region Name'].str.contains('\[edit\]')].reset_index(drop=True)

print(df)</code>

Output:

      State   Region Name
0   Alabama        Auburn
1   Alabama      Florence
2   Alabama  Jacksonville
3   Alabama    Livingston
4   Alabama    Montevallo
5   Alabama          Troy
6   Alabama    Tuscaloosa
7   Alabama      Tuskegee
8    Alaska     Fairbanks
9   Arizona     Flagstaff
10  Arizona         Tempe
11  Arizona        Tucson

The above is the detailed content of How to Create a Pandas DataFrame from a Text File with Specific Patterns?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn