How to process Excel data with Python's Pandas library?-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

How to process Excel data with Python's Pandas library?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 08, 2023 pm 09:49 PM

excelpythonpandas

1. Read xlsx table: pd.read_excel()

The original content is as follows:

How to process Excel data with Pythons Pandas library?

a) Read the nth Sheet (sub-table, you can view or add or delete sub-tables in the lower left) data

import pandas as pd
# 每次都需要修改的路径
path = "test.xlsx"
# sheet_name默认为0，即读取第一个sheet的数据
sheet = pd.read_excel(path, sheet_name=0)
print(sheet)
"""
  Unnamed: 0  name1  name2  name3
0       row1      1    2.0      3
1       row2      4    NaN      6
2       row3      7    8.0      9
"""

It can be noticed that there is no content in the upper left corner of the original form, and the read result is "Unnamed : 0", this is because the read_excel function will default the first row of the table as the column index name . In addition, for row index names, numbering starts from the second row by default (because the default first row is the column index name, so the default first row is not data). If not specifically specified, numbering starts from 0 automatically, as follows.

sheet = pd.read_excel(path)
# 查看列索引名，返回列表形式
print(sheet.columns.values)
# 查看行索引名，默认从第二行开始编号，如果不特意指定，则自动从0开始编号，返回列表形式
print(sheet.index.values)
"""
[&#39;Unnamed: 0&#39; &#39;name1&#39; &#39;name2&#39; &#39;name3&#39;]
[0 1 2]
"""

b) The column index name can also be customized, as follows:

sheet = pd.read_excel(path, names=[&#39;col1&#39;, &#39;col2&#39;, &#39;col3&#39;, &#39;col4&#39;])
print(sheet)
# 查看列索引名，返回列表形式
print(sheet.columns.values)
"""
   col1  col2  col3  col4
0  row1     1   2.0     3
1  row2     4   NaN     6
2  row3     7   8.0     9
[&#39;col1&#39; &#39;col2&#39; &#39;col3&#39; &#39;col4&#39;]
"""

c) You can also specify the nth column as the row index name , as follows:

# 指定第一列为行索引
sheet = pd.read_excel(path, index_col=0)
print(sheet)
"""
      name1  name2  name3
row1      1    2.0      3
row2      4    NaN      6
row3      7    8.0      9
"""

d) Skip the nth row of data when reading

# 跳过第2行的数据（第一行索引为0）
sheet = pd.read_excel(path, skiprows=[1])
print(sheet)
"""
  Unnamed: 0  name1  name2  name3
0       row2      4    NaN      6
1       row3      7    8.0      9
"""

2. Get the data size of the table: shape

path = "test.xlsx"
# 指定第一列为行索引
sheet = pd.read_excel(path, index_col=0)
print(sheet)
print(&#39;==========================&#39;)
print(&#39;shape of sheet:&#39;, sheet.shape)
"""
      name1  name2  name3
row1      1    2.0      3
row2      4    NaN      6
row3      7    8.0      9
==========================
shape of sheet: (3, 3)
"""

3. Method of indexing data: [ ] / loc[] / iloc[]

1. Directly add square brackets to index

You can use square brackets to add column names The method [col_name] is used to extract the data of a certain column, and then use square brackets plus the index number [index] to index the value of the specific position of this column. Here, the column named name1 is indexed, and then the data located in row 1 of the column (index is 1) is printed: 4, as follows:

sheet = pd.read_excel(path)
# 读取列名为 name1 的列数据
col = sheet[&#39;name1&#39;]
print(col)
# 打印该列第二个数据
print(col[1]) # 4
"""
0    1
1    4
2    7
Name: name1, dtype: int64
4
"""

2, iloc method, index by integer number

Use sheet.iloc[ ] index, the square brackets are the integer position numbers of the rows and columns (starting from 0 after excluding the column as the row index and the row as the column index) serial number).
a) sheet.iloc[1, 2]: Extract row 2, column 3 data. The first is the row index, the second is the column index

b) sheet.iloc[0: 2]: Extract the first two rowsdata

c) sheet.iloc[0:2, 0:2]: Extract the first two columns data of the first two rows through sharding

# 指定第一列数据为行索引
sheet = pd.read_excel(path, index_col=0)
# 读取第2行（row2）的第3列（6）数据
# 第一个是行索引，第二个是列索引
data = sheet.iloc[1, 2]
print(data)  # 6
print(&#39;================================&#39;)
# 通过分片的方式提取 前两行 数据
data_slice = sheet.iloc[0:2]
print(data_slice)
print(&#39;================================&#39;)
# 通过分片的方式提取 前两行 的 前两列 数据
data_slice = sheet.iloc[0:2, 0:2]
print(data_slice)
"""
6
================================
      name1  name2  name3
row1      1    2.0      3
row2      4    NaN      6
================================
      name1  name2
row1      1    2.0
row2      4    NaN
"""

3. loc method, index by row and column name

Use sheet.loc[ ] index, the square brackets are row and row The name string . The specific usage is the same as iloc , except that the integer index of iloc is replaced by the name index of the row and column. This indexing method is more intuitive to use.

Note: iloc[1: 2] does not contain 2, but loc['row1': 'row2'] does Contains 'row2'.

# 指定第一列数据为行索引
sheet = pd.read_excel(path, index_col=0)
# 读取第2行（row2）的第3列（6）数据
# 第一个是行索引，第二个是列索引
data = sheet.loc[&#39;row2&#39;, &#39;name3&#39;]
print(data)  # 1
print(&#39;================================&#39;)
# 通过分片的方式提取 前两行 数据
data_slice = sheet.loc[&#39;row1&#39;: &#39;row2&#39;]
print(data_slice)
print(&#39;================================&#39;)
# 通过分片的方式提取 前两行 的 前两列 数据
data_slice1 = sheet.loc[&#39;row1&#39;: &#39;row2&#39;, &#39;name1&#39;: &#39;name2&#39;]
print(data_slice1)
"""
6
================================
      name1  name2  name3
row1      1    2.0      3
row2      4    NaN      6
================================
      name1  name2
row1      1    2.0
row2      4    NaN
"""

4. Determine whether the data is empty: np.isnan() / pd.isnull()

1. Use isnan() or of the numpy library The isnull() method of the pandas library determines whether it is equal to nan .

sheet = pd.read_excel(path)
# 读取列名为 name1 的列数据
col = sheet[&#39;name2&#39;]
 
print(np.isnan(col[1]))  # True
print(pd.isnull(col[1]))  # True
"""
True
True
"""

2. Use str() to convert it to a string and determine whether it is equal to 'nan' .

sheet = pd.read_excel(path)
# 读取列名为 name1 的列数据
col = sheet[&#39;name2&#39;]
print(col)
# 打印该列第二个数据
if str(col[1]) == &#39;nan&#39;:
    print(&#39;col[1] is nan&#39;)
"""
0    2.0
1    NaN
2    8.0
Name: name2, dtype: float64
col[1] is nan
"""

5. Find data that meets the conditions

Let’s understand the following code

# 提取name1 == 1 的行
mask = (sheet[&#39;name1&#39;] == 1)
x = sheet.loc[mask]
print(x)
"""
      name1  name2  name3
row1      1    2.0      3
"""

6. Modify element value: replace()

sheet['name2'].replace(2, 100, inplace=True) : Change element 2 of column name2 to element 100, and operate in place.

sheet[&#39;name2&#39;].replace(2, 100, inplace=True)
print(sheet)
"""
      name1  name2  name3
row1      1  100.0      3
row2      4    NaN      6
row3      7    8.0      9
"""

sheet['name2'].replace(np.nan, 100, inplace=True) : Change the empty element (nan) in the name2 column to element 100, operate in place .

import numpy as np 
sheet[&#39;name2&#39;].replace(np.nan, 100, inplace=True)
print(sheet)
print(type(sheet.loc[&#39;row2&#39;, &#39;name2&#39;]))
"""
      name1  name2  name3
row1      1    2.0      3
row2      4  100.0      6
row3      7    8.0      9
"""

7. Add data: [ ]

To add a column, directly use square brackets [name to add] to add.

sheet['name_add'] = [55, 66, 77]: Add a column named name_add with a value of [55, 66, 77]

path = "test.xlsx"
# 指定第一列为行索引
sheet = pd.read_excel(path, index_col=0)
print(sheet)
print(&#39;====================================&#39;)
# 添加名为 name_add 的列，值为[55, 66, 77]
sheet[&#39;name_add&#39;] = [55, 66, 77]
print(sheet)
"""
      name1  name2  name3
row1      1    2.0      3
row2      4    NaN      6
row3      7    8.0      9
====================================
      name1  name2  name3  name_add
row1      1    2.0      3        55
row2      4    NaN      6        66
row3      7    8.0      9        77
"""

8. Delete data: del() / drop()

a) del(sheet['name3']): Use the del method to delete

sheet = pd.read_excel(path, index_col=0)
# 使用 del 方法删除 &#39;name3&#39; 的列
del(sheet[&#39;name3&#39;])
print(sheet)
"""
      name1  name2
row1      1    2.0
row2      4    NaN
row3      7    8.0
"""

b) sheet.drop('row1', axis=0)

Use the drop method to delete the row1 row. If the column is deleted, the corresponding axis=1.

When the inplace parameter is True, the parameter will not be returned and will be deleted directly on the original data.

When the inplace parameter is False (default), the original data will not be modified, but the modified data will be returned. Data

sheet.drop(&#39;row1&#39;, axis=0, inplace=True)
print(sheet)
"""
      name1  name2  name3
row2      4    NaN      6
row3      7    8.0      9
"""

c)sheet.drop(labels=['name1', 'name2'], axis=1)

Use label=[ ] parameter to delete Multiple rows or columns

# 删除多列，默认 inplace 参数位 False，即会返回结果
print(sheet.drop(labels=[&#39;name1&#39;, &#39;name2&#39;], axis=1))
"""
      name3
row1      3
row2      6
row3      9
"""

9. Save to excel file: to_excel()

1. Save the data in pandas format as an .xlsx file

names = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;]
scores = [99, 100, 99]
result_excel = pd.DataFrame()
result_excel["姓名"] = names
result_excel["评分"] = scores
# 写入excel
result_excel.to_excel(&#39;test3.xlsx&#39;)

How to process Excel data with Pythons Pandas library?

2. Save the modified excel file as an .xlsx file.

For example, after modifying nan in the original table to 100, save the file:

import numpy as np 
# 指定第一列为行索引
sheet = pd.read_excel(path, index_col=0)
sheet[&#39;name2&#39;].replace(np.nan, 100, inplace=True)
sheet.to_excel(&#39;test2.xlsx&#39;)

Open test2.xlsx and the result is as follows:

How to process Excel data with Pythons Pandas library?

The above is the detailed content of How to process Excel data with Python's Pandas library?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete

Python: compiler or Interpreter?May 13, 2025 am 12:10 AM

Python is an interpreted language, but it also includes the compilation process. 1) Python code is first compiled into bytecode. 2) Bytecode is interpreted and executed by Python virtual machine. 3) This hybrid mechanism makes Python both flexible and efficient, but not as fast as a fully compiled language.

Python For Loop vs While Loop: When to Use Which?May 13, 2025 am 12:07 AM

Useaforloopwheniteratingoverasequenceorforaspecificnumberoftimes;useawhileloopwhencontinuinguntilaconditionismet.Forloopsareidealforknownsequences,whilewhileloopssuitsituationswithundeterminediterations.

Python loops: The most common errorsMay 13, 2025 am 12:07 AM

Pythonloopscanleadtoerrorslikeinfiniteloops,modifyinglistsduringiteration,off-by-oneerrors,zero-indexingissues,andnestedloopinefficiencies.Toavoidthese:1)Use'i

For loop and while loop in Python: What are the advantages of each?May 13, 2025 am 12:01 AM

Forloopsareadvantageousforknowniterationsandsequences,offeringsimplicityandreadability;whileloopsareidealfordynamicconditionsandunknowniterations,providingcontrolovertermination.1)Forloopsareperfectforiteratingoverlists,tuples,orstrings,directlyacces

Python: A Deep Dive into Compilation and InterpretationMay 12, 2025 am 12:14 AM

Pythonusesahybridmodelofcompilationandinterpretation:1)ThePythoninterpretercompilessourcecodeintoplatform-independentbytecode.2)ThePythonVirtualMachine(PVM)thenexecutesthisbytecode,balancingeaseofusewithperformance.

Is Python an interpreted or a compiled language, and why does it matter?May 12, 2025 am 12:09 AM

Pythonisbothinterpretedandcompiled.1)It'scompiledtobytecodeforportabilityacrossplatforms.2)Thebytecodeistheninterpreted,allowingfordynamictypingandrapiddevelopment,thoughitmaybeslowerthanfullycompiledlanguages.

For Loop vs While Loop in Python: Key Differences ExplainedMay 12, 2025 am 12:08 AM

Forloopsareidealwhenyouknowthenumberofiterationsinadvance,whilewhileloopsarebetterforsituationswhereyouneedtoloopuntilaconditionismet.Forloopsaremoreefficientandreadable,suitableforiteratingoversequences,whereaswhileloopsoffermorecontrolandareusefulf

For and While loops: a practical guideMay 12, 2025 am 12:07 AM

Forloopsareusedwhenthenumberofiterationsisknowninadvance,whilewhileloopsareusedwhentheiterationsdependonacondition.1)Forloopsareidealforiteratingoversequenceslikelistsorarrays.2)Whileloopsaresuitableforscenarioswheretheloopcontinuesuntilaspecificcond

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Dreamweaver CS6

Visual web development tools

WebStorm Mac version

Useful JavaScript development tools

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Hot Topics

1666

1425

1328

1273

1253