ETL：从文本中提取人名-Python教程-PHP中文网

首页

后端开发

Python教程

ETL：从文本中提取人名

Linda Hamilton

Oct 08, 2024 am 06:20 AM

假设我们想要抓取chicagomusiccompass.com。

如您所见，它有几张卡片，每张卡片代表一个事件。现在，让我们看看下一篇：

ETL: Extracting a Person

请注意，活动名称是：


jazmin bean: the traumatic livelihood tour

现在的问题是：我们如何从文本中提取艺术家的名字？

作为一个人，我可以“轻松地”看出 jazmin bean 是艺术家——只需查看他们的 wiki 页面即可。但编写代码来提取该名称可能会很棘手。

我们可能会想，“嘿，: 之前的任何内容都应该是艺术家的名字”，这看起来很聪明，对吧？它适用于这种情况，但是这个呢：


happy hour on the patio: kathryn & chris

这里，顺序颠倒了。我们可以不断添加逻辑来处理不同的情况，但很快我们就会得到大量脆弱的规则，并且可能无法涵盖所有内容。

这就是命名实体识别（NER）模型派上用场的地方。它们是开源的，可以帮助我们从文本中提取名称。它不会捕获所有案例，但大多数时候，他们都会为我们提供所需的信息。

通过这种方法，提取变得更加容易。我选择 Python 是因为 Python 机器学习社区是无与伦比的。


from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_base")

text = "jazmin bean: the traumatic livelihood tour"
labels = ["person", "bands", "projects"]
entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

生成输出：


jazmin bean => person

现在，让我们看看另一个案例：


happy hour on the patio: kathryn & chris

输出：


kathryn => person
chris => person

来源-GLiNER

太棒了，对吧？不再需要繁琐的逻辑来提取名称，只需使用模型即可。当然，它不会涵盖所有可能的情况，但对于我的项目来说，这种灵活性就很好了。如果您需要更高的准确性，您可以随时：

尝试不同的模型
对现有模型做出贡献
分叉项目并调整它以满足您的需求

结论

作为软件开发人员，强烈建议及时了解机器学习领域的工具。并不是所有事情都可以通过简单的编程和逻辑来解决 - 有些挑战可以使用模型和统计更好地解决。

以上是ETL：从文本中提取人名的详细内容。更多信息请关注PHP中文网其他相关文章！

声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

Python：深入研究汇编和解释May 12, 2025 am 12:14 AM

pythonisehybridmodelofcompilationand interpretation：1）thepythoninterspretercompilesourcececodeintoplatform- interpententbybytecode.2）thepytythonvirtualmachine（pvm）thenexecuteCutestestestesteSteSteSteSteSteSthisByTecode，BelancingEaseofuseWithPerformance。

Python是一种解释或编译语言，为什么重要？May 12, 2025 am 12:09 AM

pythonisbothinterpretedAndCompiled.1）它的compiledTobyTecodeForportabilityAcrosplatforms.2）bytecodeisthenInterpreted，允许fordingfordforderynamictynamictymictymictymictyandrapiddefupment，尽管Ititmaybeslowerthananeflowerthanancompiledcompiledlanguages。

对于python中的循环时循环与循环：解释了关键差异May 12, 2025 am 12:08 AM

在您的知识之际，而foroopsareideal insinAdvance中，而WhileLoopSareBetterForsituations则youneedtoloopuntilaconditionismet

循环时：实用指南May 12, 2025 am 12:07 AM

ForboopSareSusedwhenthentheneMberofiterationsiskNownInAdvance，而WhileLoopSareSareDestrationsDepportonAcondition.1）ForloopSareIdealForiteratingOverSequencesLikelistSorarrays.2）whileLeleLooleSuitableApeableableableableableableforscenarioscenarioswhereTheLeTheLeTheLeTeLoopContinusunuesuntilaspecificiccificcificCondond

Python：它是真正的解释吗？揭穿神话May 12, 2025 am 12:05 AM

pythonisnotpuroly interpred; itosisehybridablectofbytecodecompilationandruntimeinterpretation.1）PythonCompiLessourceceCeceDintobyTecode，whitsthenexecececected bytybytybythepythepythepythonvirtirtualmachine（pvm）.2）

与同一元素的Python串联列表May 11, 2025 am 12:08 AM

concateNateListsinpythonwithTheSamelements，使用：1）operatototakeepduplicates，2）asettoremavelemavphicates，or3）listCompreanspearensionforcontroloverduplicates，每个methodhasdhasdifferentperferentperferentperforentperforentperforentperfortenceandordormplications。

解释与编译语言：Python的位置May 11, 2025 am 12:07 AM

pythonisanterpretedlanguage，offeringosofuseandflexibilitybutfacingperformancelanceLimitationsInCricapplications.1）drightingedlanguageslikeLikeLikeLikeLikeLikeLikeLikeThonexecuteline-by-line，允许ImmediaMediaMediaMediaMediaMediateFeedBackAndBackAndRapidPrototypiD.2）compiledLanguagesLanguagesLagagesLikagesLikec/c thresst

循环时：您什么时候在Python中使用？May 11, 2025 am 12:05 AM

Useforloopswhenthenumberofiterationsisknowninadvance,andwhileloopswheniterationsdependonacondition.1)Forloopsareidealforsequenceslikelistsorranges.2)Whileloopssuitscenarioswheretheloopcontinuesuntilaspecificconditionismet,usefulforuserinputsoralgorit

See all articles