首页 >后端开发 >Python教程 >如何有效地检查 Pandas 系列中的多个子字符串？

如何有效地检查 Pandas 系列中的多个子字符串？

Patricia Arquette原创: 2024-12-14 15:04:11780浏览

How Can I Efficiently Check for Multiple Substrings Within a Pandas Series?

使用 Pandas DataFrame 测试字符串中子字符串的存在

在 Python 的 Pandas 库中处理字符串数据时，您可能会遇到需要确定是否字符串包含给定列表中的任何子字符串。虽然有多种函数可以检查子字符串是否存在，例如 df.isin() 和 df[col].str.contains()，但组合使用它们可能会有些复杂。

假设我们有一个 Pandas Series 包含“cat”、“hat”、“dog”、“fog”和“pet”等字符串，我们想要识别包含“og”或“og”的所有字符串“at。”

一种解决方案是采用正则表达式模式，使用“|”匹配列表中的任何子字符串。特点。例如，通过使用“|”连接 searchfor 中的子字符串，我们创建一个正则表达式：

>>> searchfor = ['og', 'at']
>>> regex_pattern = '|'.join(searchfor)
>>> s[s.str.contains(regex_pattern)]
0    cat
1    hat
2    dog
3    fog
dtype: object

此方法有效地查找 s 中包含“og”或“at”的所有字符串。这是一种简洁高效的方法。

但是，如果 searchfor 中的子字符串包含“$”或“^”等特殊字符，则使用 re.escape() 对其进行转义以确保字面匹配至关重要。例如：

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> regex_pattern = '|'.join(safe_matches)
>>> s[s.str.contains(regex_pattern)]
0    cat
1    hat
2    dog
3    fog
dtype: object

通过转义特殊字符，我们确保它们与 str.contains 一起使用时按字面匹配每个字符。这种方法为 Pandas Series 中的子串检测提供了强大的解决方案。

以上是如何有效地检查 Pandas 系列中的多个子字符串？的详细内容。更多信息请关注PHP中文网其他相关文章！

Python pandas String if for while include using Regex this

声明：

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

上一篇：What's the More Pythonic Way to Convert a String to Bytes in Python 3?下一篇：Why Do My Python-Generated CSV Files Have Blank Lines Between Rows?

查看更多