如何从具有包含州和地区模式的特定结构的文本文件创建 Pandas DataFrame？-Python教程-PHP中文网

首页

后端开发

Python教程

如何从具有包含州和地区模式的特定结构的文本文件创建 Pandas DataFrame？

Barbara Streisand

Nov 03, 2024 am 03:05 AM

How can I create a Pandas DataFrame from a text file with a specific structure that includes state and region patterns?

从具有状态和区域模式的文本文件中读取和调整 Pandas DataFrame

从具有特定结构的文本文件创建 Pandas DataFrame 需要策略性的数据操作。让我们深入研究这个问题并探索一种解决方案，将提供的文本转换为所需的 DataFrame。

数据结构

文本文件遵循分层结构，其中：

带有“[edit]”的行是州名称。
带有“[number]”的行是地区名称。
同一州的地区名称应该重复。

解决方案

1.读取文本文件

首先，读取文本文件并使用 read_csv() 创建一个 DataFrame。由于没有特定的分隔符，请指定数据中不存在的自定义分隔符，例如分号：

<code class="python">df = pd.read_csv('filename.txt', sep=";", names=['Region Name'])</code>

2。提取州名称

使用 str.extract() 方法和正则表达式识别包含州名称的行，以捕获最多“[edit]”的州名称。使用以下值创建一个名为“State”的新列：

<code class="python">df.insert(0, 'State', df['Region Name'].str.extract('(.*)\[edit\]', expand=False).ffill())</code>

3。从区域名称中删除括号信息

从“区域名称”列中删除括号及其中包含的任何字符：

<code class="python">df['Region Name'] = df['Region Name'].str.replace(r' \(.+$', '')</code>

4.删除州标题行

删除“区域名称”列中出现“[edit]”的行。使用 str.contains() 创建掩码：

<code class="python">df = df[~df['Region Name'].str.contains('\[edit\]')].reset_index(drop=True)</code>

5。最终 DataFrame

此时，您已根据需要获得了包含“州”和“地区名称”列的 DataFrame。

<code class="python">print(df)</code>

扩展解决方案

如果您更喜欢在“区域名称”列中包含括号内的文本，这里是修改后的解决方案：

<code class="python">df.insert(0, 'State', df['Region Name'].str.extract('(.*)\[edit\]', expand=False).ffill())
df = df[~df['Region Name'].str.contains('\[edit\]')].reset_index(drop=True)

print(df)</code>

这将生成一个包含“州”和“区域名称”列的 DataFrame，其中区域名称包括括号内的文本。

以上是如何从具有包含州和地区模式的特定结构的文本文件创建 Pandas DataFrame？的详细内容。更多信息请关注PHP中文网其他相关文章！

声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

Python：深入研究汇编和解释May 12, 2025 am 12:14 AM

pythonisehybridmodelofcompilationand interpretation：1）thepythoninterspretercompilesourcececodeintoplatform- interpententbybytecode.2）thepytythonvirtualmachine（pvm）thenexecuteCutestestestesteSteSteSteSteSteSthisByTecode，BelancingEaseofuseWithPerformance。

Python是一种解释或编译语言，为什么重要？May 12, 2025 am 12:09 AM

pythonisbothinterpretedAndCompiled.1）它的compiledTobyTecodeForportabilityAcrosplatforms.2）bytecodeisthenInterpreted，允许fordingfordforderynamictynamictymictymictymictyandrapiddefupment，尽管Ititmaybeslowerthananeflowerthanancompiledcompiledlanguages。

对于python中的循环时循环与循环：解释了关键差异May 12, 2025 am 12:08 AM

在您的知识之际，而foroopsareideal insinAdvance中，而WhileLoopSareBetterForsituations则youneedtoloopuntilaconditionismet

循环时：实用指南May 12, 2025 am 12:07 AM

ForboopSareSusedwhenthentheneMberofiterationsiskNownInAdvance，而WhileLoopSareSareDestrationsDepportonAcondition.1）ForloopSareIdealForiteratingOverSequencesLikelistSorarrays.2）whileLeleLooleSuitableApeableableableableableableforscenarioscenarioswhereTheLeTheLeTheLeTeLoopContinusunuesuntilaspecificiccificcificCondond

Python：它是真正的解释吗？揭穿神话May 12, 2025 am 12:05 AM

pythonisnotpuroly interpred; itosisehybridablectofbytecodecompilationandruntimeinterpretation.1）PythonCompiLessourceceCeceDintobyTecode，whitsthenexecececected bytybytybythepythepythepythonvirtirtualmachine（pvm）.2）

您可以使用Python中的循环加入列表吗？May 10, 2025 am 12:14 AM

是的，YouCanconCatenatElistsusingAloopInpyThon.1）使用eparateLoopsForeachListToAppendIteMstoaresultList.2）useanestedlooptoiterateOverMultipliplipliplipliplipliplipliplipliplipliplistforamoreConciseApprace.3）

condenate列表python：使用，扩展（）等May 10, 2025 am 12:12 AM

ThemostefficientmethodsforconcatenatinglistsinPythonare:1)theextend()methodforin-placemodification,2)itertools.chain()formemoryefficiencywithlargedatasets.Theextend()methodmodifiestheoriginallist,makingitmemory-efficientbutrequirescautionifpreserving

Python循环：示例和最佳实践May 10, 2025 am 12:05 AM

pythonboopsincludeforandwhileloops，with forloopsidealforequencessand and whileloopsforcondition repetition.bestpracticesinvolve：1）使用listComprehensionsforshensionsforsimpletranspletransformations，2）obseringEnumerateForIndex-valuepairs，3）optingftingftingfortermornemoremoremoremore

See all articles