Home >Backend Development >Python Tutorial >How to Create a Pandas DataFrame from a Text File with a Specific Pattern?

How to Create a Pandas DataFrame from a Text File with a Specific Pattern?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-11-03 09:20:02273browse

How to Create a Pandas DataFrame from a Text File with a Specific Pattern?

How to Create a Pandas DataFrame from a txt File with a Specific Pattern

Problem: You have a text file with a specific structure and you need to create a Pandas DataFrame based on the following pattern:

Alabama[edit]
Auburn (Auburn University)[1]
Florence (University of North Alabama)
Jacksonville (Jacksonville State University)[2]
Livingston (University of West Alabama)[2]
Montevallo (University of Montevallo)[2]
Troy (Troy University)[2]
Tuscaloosa (University of Alabama, Stillman College, Shelton State)[3][4]
Tuskegee (Tuskegee University)[5]
...

<State>[edit]
<Region Name 1>
<Region Name 2>
...

The state names should be repeated for each region name.

Solution:

<code class="python">import pandas as pd

# Read the text file into a DataFrame with the column name 'Region Name'
df = pd.read_csv('filename.txt', sep=";", names=['Region Name'])

# Extract the state names from the rows containing '[edit]'
state_names = df[df['Region Name'].str.contains('\[edit\]')]['Region Name']

# Replace the region names with state names in the rows where the region name contains '[edit]'
df['Region Name'] = df['Region Name'].str.replace('\[edit\]', state_names)

# Replace the region names with state names in the rows where the region name contains '[number]' or '[characters]'
df['Region Name'] = df['Region Name'].str.replace(' \(.+$', '')

# Insert a new column 'State' with the state name for each region name
df.insert(0, 'State', df['Region Name'].ffill())

# Drop the rows where the region name contains '[edit]' leaving the columns State and Region Name
df = df[~df['Region Name'].str.contains('\[edit\]')].reset_index(drop=True)

print(df)</code>

The resulting DataFrame will have the following output:

      State   Region Name
0   Alabama        Auburn
1   Alabama      Florence
2   Alabama  Jacksonville
3   Alabama    Livingston
4   Alabama    Montevallo
5   Alabama          Troy
6   Alabama    Tuscaloosa
7   Alabama      Tuskegee
8    Alaska     Fairbanks
9   Arizona     Flagstaff
10  Arizona         Tempe
11  Arizona        Tucson

The above is the detailed content of How to Create a Pandas DataFrame from a Text File with a Specific Pattern?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn