4000字详细说明，推荐20个好用到爆的Pandas函数方法-Python教程-PHP中文网

首页

后端开发

Python教程

4000字详细说明，推荐20个好用到爆的Pandas函数方法

Python当打之年

Aug 10, 2023 pm 02:52 PM

pythonpandas

今天分享几个不为人知的pandas函数，大家可能平时看到的不多，但是使用起来倒是非常的方便，也能够帮助我们数据分析人员大幅度地提高工作效率，同时也希望大家看完之后能够有所收获

items() 方法items()方法
iterrows()方法
insert()方法
assign()方法
eval()方法
pop()方法
truncate()方法
count()方法
add_prefix()方法/add_suffix()方法
clip()方法
filter()

iterrows()

insert()

分配()

pop()

count()

add_prefix()

add_suffix()

clip()

filter()

first() 方法first()方法
isin()方法
df.plot.area()方法
df.plot.bar()方法
df.plot.box()方法
df.plot.pie()方法

`items()`方法

pandas当中的items()方法可以用来遍历数据集当中的每一列，同时返回列名以及每一列当中的内容，通过以元组的形式，示例如下

df = pd.DataFrame({&#39;species&#39;: [&#39;bear&#39;, &#39;bear&#39;, &#39;marsupial&#39;],
                  &#39;population&#39;: [1864, 22000, 80000]},
                  index=[&#39;panda&#39;, &#39;polar&#39;, &#39;koala&#39;])
df

output

         species  population
panda       bear        1864
polar       bear       22000
koala  marsupial       80000

然后我们使用items()方法

for label, content in df.items():
    print(f&#39;label: {label}&#39;)
    print(f&#39;content: {content}&#39;, sep=&#39;\n&#39;)
    print("=" * 50)

output

label: species
content: panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
==================================================
label: population
content: panda     1864
polar    22000
koala    80000
Name: population, dtype: int64
==================================================

相继的打印出了‘species’和‘population’这两列的列名和相应的内容

`iterrows()`方法

而对于iterrows() li>

isin() 方法

🎜🎜df.plot.area() 方法🎜🎜🎜df .plot.bar()方法🎜🎜🎜df.plot.box()方法🎜🎜🎜df.plot.pie()方法🎜

`items()`方法

pandas 暴露的items() 方法用于遍历数据集占用的每一列，同时返回列名以及每一列贸易的内容，通过以元组的形式，示例如下🎜

for label, content in df.iterrows():
    print(f&#39;label: {label}&#39;)
    print(f&#39;content: {content}&#39;, sep=&#39;\n&#39;)
    print("=" * 50)

🎜output🎜

label: panda
content: species       bear
population    1864
Name: panda, dtype: object
==================================================
label: polar
content: species        bear
population    22000
Name: polar, dtype: object
==================================================
label: koala
content: species       marsupial
population        80000
Name: koala, dtype: object
==================================================

🎜然后我们使用items() 方法🎜

df.insert(1, "size", [2000, 3000, 4000])

🎜输出🎜

         species  size  population
panda       bear  2000        1864
polar       bear  3000       22000
koala  marsupial  4000       80000

🎜表格的打印生长'物种'和'种群'这列的列名称及相应的内容🎜

`iterrows()`方法

而对于iterrows() 方法而言，其功能遍历数据集里的每一行，返回每一行的索引以及带有列名的每一行的内容，示例如下🎜

df.assign(size_1=lambda x: x.population * 9 / 5 + 32)

🎜output🎜

label: panda
content: species       bear
population    1864
Name: panda, dtype: object
==================================================
label: polar
content: species        bear
population    22000
Name: polar, dtype: object
==================================================
label: koala
content: species       marsupial
population        80000
Name: koala, dtype: object
==================================================

`insert()`方法

insert()方法主要是用于在数据集当中的特定位置处插入数据，示例如下

df.insert(1, "size", [2000, 3000, 4000])

output

         species  size  population
panda       bear  2000        1864
polar       bear  3000       22000
koala  marsupial  4000       80000

可见在DataFrame数据集当中，列的索引也是从0开始的

`assign()`方法

assign()方法可以用来在数据集当中添加新的列，示例如下

df.assign(size_1=lambda x: x.population * 9 / 5 + 32)

output

         species  population    size_1
panda       bear        1864    3387.2
polar       bear       22000   39632.0
koala  marsupial       80000  144032.0

从上面的例子中可以看出，我们通过一个lambda匿名函数，在数据集当中添加一个新的列，命名为‘size_1’，当然我们也可以通过assign()方法来创建不止一个列

df.assign(size_1 = lambda x: x.population * 9 / 5 + 32,
          size_2 = lambda x: x.population * 8 / 5 + 10)

output

         species  population    size_1    size_2
panda       bear        1864    3387.2    2992.4
polar       bear       22000   39632.0   35210.0
koala  marsupial       80000  144032.0  128010.0

`eval()`方法

eval()方法主要是用来执行用字符串来表示的运算过程的，例如

df.eval("size_3 = size_1 + size_2")

output

         species  population    size_1    size_2    size_3
panda       bear        1864    3387.2    2992.4    6379.6
polar       bear       22000   39632.0   35210.0   74842.0
koala  marsupial       80000  144032.0  128010.0  272042.0

当然我们也可以同时对执行多个运算过程

df = df.eval(&#39;&#39;&#39;
size_3 = size_1 + size_2
size_4 = size_1 - size_2
&#39;&#39;&#39;)

output

         species  population    size_1    size_2    size_3   size_4
panda       bear        1864    3387.2    2992.4    6379.6    394.8
polar       bear       22000   39632.0   35210.0   74842.0   4422.0
koala  marsupial       80000  144032.0  128010.0  272042.0  16022.0

`pop()`方法

pop()方法主要是用来删除掉数据集中特定的某一列数据

df.pop("size_3")

output

panda      6379.6
polar     74842.0
koala    272042.0
Name: size_3, dtype: float64

而原先的数据集当中就没有这个‘size_3’这一例的数据了

`truncate()`方法

truncate()方法主要是根据行索引来筛选指定行的数据的，示例如下

df = pd.DataFrame({&#39;A&#39;: [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;],
                   &#39;B&#39;: [&#39;f&#39;, &#39;g&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;],
                   &#39;C&#39;: [&#39;k&#39;, &#39;l&#39;, &#39;m&#39;, &#39;n&#39;, &#39;o&#39;]},
                  index=[1, 2, 3, 4, 5])

output

   A  B  C
1  a  f  k
2  b  g  l
3  c  h  m
4  d  i  n
5  e  j  o

我们使用truncate()方法来做一下尝试

df.truncate(before=2, after=4)

output

   A  B  C
2  b  g  l
3  c  h  m
4  d  i  n

我们看到参数before和after存在于truncate()方法中，目的就是把行索引2之前和行索引4之后的数据排除在外，筛选出剩余的数据

`count()`方法

count()方法主要是用来计算某一列当中非空值的个数，示例如下

df = pd.DataFrame({"Name": ["John", "Myla", "Lewis", "John", "John"],
                   "Age": [24., np.nan, 25, 33, 26],
                   "Single": [True, True, np.nan, True, False]})

output

    Name   Age Single
0   John  24.0   True
1   Myla   NaN   True
2  Lewis  25.0    NaN
3   John  33.0   True
4   John  26.0  False

我们使用count()方法来计算一下数据集当中非空值的个数

df.count()

output

Name      5
Age       4
Single    4
dtype: int64

add_prefix()方法/add_suffix()方法

add_prefix()方法和add_suffix()方法分别会给列名以及行索引添加后缀和前缀，对于Series()数据集而言，前缀与后缀是添加在行索引处，而对于DataFrame()数据集而言，前缀与后缀是添加在列索引处，示例如下

s = pd.Series([1, 2, 3, 4])

output

0    1
1    2
2    3
3    4
dtype: int64

我们使用add_prefix()方法与add_suffix()方法在Series()数据集上

s.add_prefix(&#39;row_&#39;)

output

row_0    1
row_1    2
row_2    3
row_3    4
dtype: int64

又例如

s.add_suffix(&#39;_row&#39;)

output

0_row    1
1_row    2
2_row    3
3_row    4
dtype: int64

而对于DataFrame()形式数据集而言，add_prefix()方法以及add_suffix()方法是将前缀与后缀添加在列索引处的

df = pd.DataFrame({&#39;A&#39;: [1, 2, 3, 4], &#39;B&#39;: [3, 4, 5, 6]})

output

示例如下

df.add_prefix("column_")

output

   column_A  column_B
0         1         3
1         2         4
2         3         5
3         4         6

又例如

df.add_suffix("_column")

output

   A_column  B_column
0         1         3
1         2         4
2         3         5
3         4         6

`clip()`方法

clip()方法主要是通过设置阈值来改变数据集当中的数值，当数值超过阈值的时候，就做出相应的调整

data = {&#39;col_0&#39;: [9, -3, 0, -1, 5], &#39;col_1&#39;: [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)

output

df.clip(lower = -4, upper = 4)

output

   col_0  col_1
0      4     -2
1     -3     -4
2      0      4
3     -1      4
4      4     -4

我们看到参数lower和upper分别代表阈值的上限与下限，数据集当中超过上限与下限的值会被替代。

`filter()`方法

pandas当中的filter()方法是用来筛选出特定范围的数据的，示例如下

df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12])),
                  index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;],
                  columns=[&#39;one&#39;, &#39;two&#39;, &#39;three&#39;])

output

   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9
D   10   11     12

我们使用filter()方法来筛选数据

df.filter(items=[&#39;one&#39;, &#39;three&#39;])

output

   one  three
A    1      3
B    4      6
C    7      9
D   10     12

我们还可以使用正则表达式来筛选数据

df.filter(regex=&#39;e$&#39;, axis=1)

output

   one  three
A    1      3
B    4      6
C    7      9
D   10     12

当然通过参数axis来调整筛选行方向或者是列方向的数据

df.filter(like=&#39;B&#39;, axis=0)

output

   one  two  three
B    4    5      6

`first()`方法

当数据集当中的行索引是日期的时候，可以通过该方法来筛选前面几行的数据

index_1 = pd.date_range(&#39;2021-11-11&#39;, periods=5, freq=&#39;2D&#39;)
ts = pd.DataFrame({&#39;A&#39;: [1, 2, 3, 4, 5]}, index=index_1)
ts

output

            A
2021-11-11  1
2021-11-13  2
2021-11-15  3
2021-11-17  4
2021-11-19  5

我们使用first()方法来进行一些操作，例如筛选出前面3天的数据

ts.first(&#39;3D&#39;)

output

            A
2021-11-11  1
2021-11-13  2

`isin()`方法

isin()方法主要是用来确认数据集当中的数值是否被包含在给定的列表当中

df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12])),
                  index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;],
                  columns=[&#39;one&#39;, &#39;two&#39;, &#39;three&#39;])
df.isin([3, 5, 12])

output

     one    two  three
A  False  False   True
B  False   True  False
C  False  False  False
D  False  False   True

若是数值被包含在列表当中了，也就是3、5、12当中，返回的是True，否则就返回False

`df.plot.area()`方法

下面我们来讲一下如何在Pandas当中通过一行代码来绘制图表，将所有的列都通过面积图的方式来绘制

df = pd.DataFrame({
    &#39;sales&#39;: [30, 20, 38, 95, 106, 65],
    &#39;signups&#39;: [7, 9, 6, 12, 18, 13],
    &#39;visits&#39;: [20, 42, 28, 62, 81, 50],
}, index=pd.date_range(start=&#39;2021/01/01&#39;, end=&#39;2021/07/01&#39;, freq=&#39;M&#39;))

ax = df.plot.area(figsize = (10, 5))

output

4000字详细说明，推荐20个好用到爆的Pandas函数方法

`df.plot.bar()`方法

下面我们看一下如何通过一行代码来绘制柱状图

df = pd.DataFrame({&#39;label&#39;:[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], &#39;values&#39;:[10, 30, 50, 70]})
ax = df.plot.bar(x=&#39;label&#39;, y=&#39;values&#39;, rot=20)

output

4000字详细说明，推荐20个好用到爆的Pandas函数方法

当然我们也可以根据不同的类别来绘制柱状图

age = [0.1, 17.5, 40, 48, 52, 69, 88]
weight = [2, 8, 70, 1.5, 25, 12, 28]
index = [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;, &#39;E&#39;, &#39;F&#39;, &#39;G&#39;]
df = pd.DataFrame({&#39;age&#39;: age, &#39;weight&#39;: weight}, index=index)
ax = df.plot.bar(rot=0)

output

4000字详细说明，推荐20个好用到爆的Pandas函数方法

当然我们也可以横向来绘制图表

ax = df.plot.barh(rot=0)

output

4000字详细说明，推荐20个好用到爆的Pandas函数方法

`df.plot.box()`方法

我们来看一下箱型图的具体的绘制，通过pandas一行代码来实现

data = np.random.randn(25, 3)
df = pd.DataFrame(data, columns=list(&#39;ABC&#39;))
ax = df.plot.box()

output

4000字详细说明，推荐20个好用到爆的Pandas函数方法

`df.plot.pie()`方法

接下来是饼图的绘制

df = pd.DataFrame({&#39;mass&#39;: [1.33, 4.87 , 5.97],
                   &#39;radius&#39;: [2439.7, 6051.8, 6378.1]},
                  index=[&#39;Mercury&#39;, &#39;Venus&#39;, &#39;Earth&#39;])
plot = df.plot.pie(y=&#39;mass&#39;, figsize=(8, 8))

output

4000字详细说明，推荐20个好用到爆的Pandas函数方法

除此之外，还有折线图、直方图、散点图等等，步骤与方式都与上述的技巧有异曲同工之妙，大家感兴趣的可以自己另外去尝试。

以上是4000字详细说明，推荐20个好用到爆的Pandas函数方法的详细内容。更多信息请关注PHP中文网其他相关文章！

声明

本文转载于：Python当打之年。如有侵权，请联系admin@php.cn删除

Python的科学计算中如何使用阵列？Apr 25, 2025 am 12:28 AM

Arraysinpython，尤其是Vianumpy，ArecrucialInsCientificComputingfortheireftheireffertheireffertheirefferthe.1）Heasuedfornumerericalicerationalation，dataAnalysis和Machinelearning.2）Numpy'Simpy'Simpy'simplementIncressionSressirestrionsfasteroperoperoperationspasterationspasterationspasterationspasterationspasterationsthanpythonlists.3）inthanypythonlists.3）andAreseNableAblequick

您如何处理同一系统上的不同Python版本？Apr 25, 2025 am 12:24 AM

你可以通过使用pyenv、venv和Anaconda来管理不同的Python版本。1）使用pyenv管理多个Python版本：安装pyenv，设置全局和本地版本。2）使用venv创建虚拟环境以隔离项目依赖。3）使用Anaconda管理数据科学项目中的Python版本。4）保留系统Python用于系统级任务。通过这些工具和策略，你可以有效地管理不同版本的Python，确保项目顺利运行。

与标准Python阵列相比，使用Numpy数组的一些优点是什么？Apr 25, 2025 am 12:21 AM

numpyarrayshaveseveraladagesoverandastardandpythonarrays：1）基于基于duetoc的iMplation，2）2）他们的aremoremoremorymorymoremorymoremorymoremorymoremoremory，尤其是WithlargedAtasets和3）效率化，效率化，矢量化函数函数函数函数构成和稳定性构成和稳定性的操作，制造

阵列的同质性质如何影响性能？Apr 25, 2025 am 12:13 AM

数组的同质性对性能的影响是双重的：1)同质性允许编译器优化内存访问，提高性能；2)但限制了类型多样性，可能导致效率低下。总之，选择合适的数据结构至关重要。

编写可执行python脚本的最佳实践是什么？Apr 25, 2025 am 12:11 AM

到CraftCraftExecutablePythcripts，lollow TheSebestPractices：1）Addashebangline（＃！/usr/usr/bin/envpython3）tomakethescriptexecutable.2）setpermissionswithchmodwithchmod xyour_script.3）

Numpy数组与使用数组模块创建的数组有何不同？Apr 24, 2025 pm 03:53 PM

numpyArraysareAreBetterFornumericalialoperations andmulti-demensionaldata，而learthearrayModuleSutableforbasic，内存效率段

Numpy数组的使用与使用Python中的数组模块阵列相比如何？Apr 24, 2025 pm 03:49 PM

numpyArraySareAreBetterForHeAvyNumericalComputing，而lelethearRayModulesiutable-usemoblemory-connerage-inderabledsswithSimpleDatateTypes.1）NumpyArsofferVerverVerverVerverVersAtility andPerformanceForlargedForlargedAtatasetSetsAtsAndAtasEndCompleXoper.2）

CTYPES模块与Python中的数组有何关系？Apr 24, 2025 pm 03:45 PM

ctypesallowscreatingingangandmanipulatingc-stylarraysinpython.1）usectypestoInterfacewithClibrariesForperfermance.2）createc-stylec-stylec-stylarraysfornumericalcomputations.3）passarraystocfunctions foreforfunctionsforeffortions.however.however，However，HoweverofiousofmemoryManageManiverage，Pressiveo，Pressivero

See all articles