How to use Python regular expressions for Excel file processing-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

How to use Python regular expressions for Excel file processing

王林

Jun 22, 2023 pm 09:48 PM

pythonregular expressionexcel processing

In the data processing process, Excel files are a widely used data source. As a data processing and analysis language, Python is very important to be able to process Excel files. For text processing in data preprocessing, regular expressions are also an indispensable tool. This article will introduce in detail how to use Python regular expressions to process Excel files.

1. Python operates Excel

Commonly used libraries for reading and writing Excel files in Python include openpyxl, pandas, xlwt, xlrd, etc. Here we mainly use the openpyxl library. openpyxl is a Python library for reading and writing Excel files. It can handle xlsx/xlsm/xltx/xltm files.

You need to use pip install openpyxl to install it before use.

When reading an Excel file, we only need to specify the path of the Excel file to be read and the Sheet name of the required operation, and the Sheet content can be read into memory. Here is an example:

from openpyxl import load_workbook

# 打开工作簿
wb = load_workbook(filename='example.xlsx', read_only=True)
# 打开工作表
ws = wb['Sheet1']
# 读取单元格内容
cell_value = ws['A1'].value

Among them, filename is the path of the Excel file to be read, and the read_only parameter is True to read the file in a read-only manner, which can speed up file reading. ws represents the Sheet to be operated on.

When reading Excel files, we usually use import pandas as pd, and then use the pd.read_excel() function to read the file, as shown below:

import pandas as pd

df = pd.read_excel('example.xlsx', sheet_name='Sheet1')

Among them, the sheet_name parameter Represents the Sheet to be read.

2. Regular expression

Regular expression is an expression used to match text that matches a pattern in a string. It is mainly used to process string text. Python provides the re module to implement regular expression functions.

When using regular expressions in Python, we need to pay attention to the following points:

, ., etc. have special meanings in regular expressions and need to be escaped;
Regular expression matching priority: brackets have the highest priority, followed by *, ,? and other repeated matching symbols, and finally | (or).
Matching mode: By default, only one row of data is matched. To match multiple rows, use re.MULTILINE.

Common metacharacters and symbols are as follows:

Symbols/Metacharacters	Meaning
.	Any characters
w	Letters, numbers and underscores
W	Not letters, numbers and underscores
d	Numbers
D	Non-numeric
s	White space characters, including spaces, tabs, newlines, etc.
S	Non-whitespace characters
^	matches the beginning of the string
$	Matches the end of the string with this character
*	Matches the previous character 0 to multiple times
	Match the previous character 1 or more times
?	Match the previous character 0 or 1 times

三、使用正则表达式处理Excel文件

有了以上介绍，我们可以开始利用正则表达式进行 Excel 文件的处理。

在使用正则表达式读取 Excel 文件时，我们可以先将 Excel 文件读取到 Pandas DataFrame 中，然后对 DataFrame 进行操作。以下是一个例子：

import pandas as pd

# 读取Excel文件，指定要处理的Sheet
df = pd.read_excel('example.xlsx', sheet_name='Sheet1')

# 利用正则表达式将文件中10开头的字符替换为'Hello'
df['A'] = df['A'].str.replace(r'^10', 'Hello')

以上代码中，我们将通过正则表达式 '^10' 匹配第一列中以 ‘10’ 开头的数据，然后将其替换为 ‘Hello’。

在 Python 中，有多种正则表达式的处理方式，这里不一一赘述，读者可以根据实际情况进行选择。

四、常见Excel文件处理操作

除了上述例子中的替换操作，Excel 文件中常见的操作还包括筛选、去重等。下面来介绍一下利用正则表达式进行这些操作的方法。

利用正则表达式筛选符合条件的行

我们可以利用 Pandas DataFrame 的 filter 方法，将符合条件的行筛选出来。以下是示例代码：

import pandas as pd

# 读取Excel文件，指定要处理的Sheet
df = pd.read_excel('example.xlsx', sheet_name='Sheet1')

# 正则表达式筛选满足条件的行
df = df.filter(regex='^1.*|.*Green.*', axis=0)

以上代码中，‘^1.’ 表示以 ‘1’ 开头的任意字符，‘|.Green.*’ 表示任意字符中包含 ‘Green’ 的行。可以根据实际情况修改正则表达式来筛选需要的行。

利用正则表达式去重

为了去除重复行，我们可以利用 Pandas DataFrame 中的 drop_duplicates 方法。下面是一个示例代码：

import pandas as pd

# 读取Excel文件，指定要处理的Sheet
df = pd.read_excel('example.xlsx', sheet_name='Sheet1')

# 根据正则表达式去重
df.drop_duplicates(subset=['A', 'B'])

以上代码中，subset 参数表示根据列名进行去重。可以根据实际情况修改该参数，从而达到需要的去重效果。

五、总结

本文通过 openpyxl 库和正则表达式的介绍，详细讲解了如何使用 Python 对 Excel 文件进行预处理操作。广大读者在使用过程中只需要理解正则表达式的语法规则，就可以根据实际情况灵活运用其进行Excel文件的处理。

The above is the detailed content of How to use Python regular expressions for Excel file processing. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

详细讲解Python之Seaborn（数据可视化）Apr 21, 2022 pm 06:08 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于Seaborn的相关问题，包括了数据可视化处理的散点图、折线图、条形图等等内容，下面一起来看一下，希望对大家有帮助。

详细了解Python进程池与进程锁May 10, 2022 pm 06:11 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于进程池与进程锁的相关问题，包括进程池的创建模块，进程池函数等等内容，下面一起来看一下，希望对大家有帮助。

Python自动化实践之筛选简历Jun 07, 2022 pm 06:59 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于简历筛选的相关问题，包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容，下面一起来看一下，希望对大家有帮助。

归纳总结Python标准库May 03, 2022 am 09:00 AM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于标准库总结的相关问题，下面一起来看一下，希望对大家有帮助。

分享10款高效的VSCode插件，总有一款能够惊艳到你！！Mar 09, 2021 am 10:15 AM

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件，能够让原本单薄的VS Code如虎添翼，开发效率顿时提升到一个新的阶段。

Python数据类型详解之字符串、数字Apr 27, 2022 pm 07:27 PM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于数据类型之字符串、数字的相关问题，下面一起来看一下，希望对大家有帮助。

详细介绍python的numpy模块May 19, 2022 am 11:43 AM

本篇文章给大家带来了关于Python的相关知识，其中主要介绍了关于numpy模块的相关问题，Numpy是Numerical Python extensions的缩写，字面意思是Python数值计算扩展，下面一起来看一下，希望对大家有帮助。

python中文是什么意思Jun 24, 2019 pm 02:22 PM

pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间，Guido van Rossum在家闲的没事干，为了跟朋友庆祝圣诞节，决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python，所以便把这门语言叫做python。

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

4 weeks agoByDDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

4 weeks agoByDDD

Hot Tools

Atom editor mac version download

The most popular open source editor

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),