search
HomeBackend DevelopmentPython Tutorial4000 words of detailed description, recommending 20 useful Pandas function methods

##Today I share a few unknown

You may not see much of the pandas function, but it is very convenient to use. It can also help our data analysts greatly improve their work efficiency. I also hope that everyone will gain something after reading it

  • items()method
  • ##iterrows()method
  • insert()Method
  • assign()Method
  • eval()method
  • ##pop()
    method
  • truncate()
    Method
  • count()
    Method
  • ##add_prefix()
  • Method/
    add_suffix ()method
  • clip()
  • method
  • filter()
  • method
    <li><section style="margin-top: 5px;margin-bottom: 5px;color: rgb(1, 1, 1);line-height: 2em;"><code style="font-size: 14px;padding: 2px 4px;border-radius: 4px;margin-right: 2px;margin-left: 2px;color: rgb(30, 107, 184);background-color: rgba(27, 31, 35, 0.05);font-family: " operator mono consolas monaco menlo monospace break-all>first()method
  • isin()method
  • ##df.plot.area()Method
  • df.plot.bar()Method
  • df.plot.box()Method
  • ##df.plot.pie()
    Method

items()method

items()
method in pandas can be used to traverse data For each column in the set, return the column name and the content of each column at the same time, in the form of a tuple, the example is as follows <pre class='brush:php;toolbar:false;'>df = pd.DataFrame({&amp;#39;species&amp;#39;: [&amp;#39;bear&amp;#39;, &amp;#39;bear&amp;#39;, &amp;#39;marsupial&amp;#39;], &amp;#39;population&amp;#39;: [1864, 22000, 80000]}, index=[&amp;#39;panda&amp;#39;, &amp;#39;polar&amp;#39;, &amp;#39;koala&amp;#39;]) df</pre>output
         species  population
panda       bear        1864
polar       bear       22000
koala  marsupial       80000

Then we use

items()

Method<pre class='brush:php;toolbar:false;'>for label, content in df.items(): print(f&amp;#39;label: {label}&amp;#39;) print(f&amp;#39;content: {content}&amp;#39;, sep=&amp;#39;\n&amp;#39;) print(&quot;=&quot; * 50)</pre>output

label: species
content: panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
==================================================
label: population
content: panda     1864
polar    22000
koala    80000
Name: population, dtype: int64
==================================================

The column names and corresponding contents of the 'species' and 'population' columns are printed out one after another

iterrows()Method As for the

iterrows()
method, its function is to traverse each row in the data set , returns the index of each row and the content of each row with column names, the example is as follows<pre class='brush:php;toolbar:false;'>for label, content in df.iterrows(): print(f&amp;#39;label: {label}&amp;#39;) print(f&amp;#39;content: {content}&amp;#39;, sep=&amp;#39;\n&amp;#39;) print(&quot;=&quot; * 50)</pre>output
label: panda
content: species       bear
population    1864
Name: panda, dtype: object
==================================================
label: polar
content: species        bear
population    22000
Name: polar, dtype: object
==================================================
label: koala
content: species       marsupial
population        80000
Name: koala, dtype: object
==================================================

insert()方法

insert()方法主要是用于在数据集当中的特定位置处插入数据,示例如下

df.insert(1, "size", [2000, 3000, 4000])

output

         species  size  population
panda       bear  2000        1864
polar       bear  3000       22000
koala  marsupial  4000       80000

可见在DataFrame数据集当中,列的索引也是从0开始的

assign()方法

assign()方法可以用来在数据集当中添加新的列,示例如下

df.assign(size_1=lambda x: x.population * 9 / 5 + 32)

output

         species  population    size_1
panda       bear        1864    3387.2
polar       bear       22000   39632.0
koala  marsupial       80000  144032.0
从上面的例子中可以看出,我们通过一个lambda匿名函数,在数据集当中添加一个新的列,命名为‘size_1’,当然我们也可以通过assign()方法来创建不止一个列
df.assign(size_1 = lambda x: x.population * 9 / 5 + 32,
          size_2 = lambda x: x.population * 8 / 5 + 10)

output

         species  population    size_1    size_2
panda       bear        1864    3387.2    2992.4
polar       bear       22000   39632.0   35210.0
koala  marsupial       80000  144032.0  128010.0

eval()方法

eval()方法主要是用来执行用字符串来表示的运算过程的,例如

df.eval("size_3 = size_1 + size_2")

output

         species  population    size_1    size_2    size_3
panda       bear        1864    3387.2    2992.4    6379.6
polar       bear       22000   39632.0   35210.0   74842.0
koala  marsupial       80000  144032.0  128010.0  272042.0

当然我们也可以同时对执行多个运算过程

df = df.eval(&#39;&#39;&#39;
size_3 = size_1 + size_2
size_4 = size_1 - size_2
&#39;&#39;&#39;)

output

         species  population    size_1    size_2    size_3   size_4
panda       bear        1864    3387.2    2992.4    6379.6    394.8
polar       bear       22000   39632.0   35210.0   74842.0   4422.0
koala  marsupial       80000  144032.0  128010.0  272042.0  16022.0

pop()方法

pop()方法主要是用来删除掉数据集中特定的某一列数据

df.pop("size_3")

output

panda      6379.6
polar     74842.0
koala    272042.0
Name: size_3, dtype: float64

而原先的数据集当中就没有这个‘size_3’这一例的数据了

truncate()方法

truncate()方法主要是根据行索引来筛选指定行的数据的,示例如下

df = pd.DataFrame({&#39;A&#39;: [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;],
                   &#39;B&#39;: [&#39;f&#39;, &#39;g&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;],
                   &#39;C&#39;: [&#39;k&#39;, &#39;l&#39;, &#39;m&#39;, &#39;n&#39;, &#39;o&#39;]},
                  index=[1, 2, 3, 4, 5])

output

   A  B  C
1  a  f  k
2  b  g  l
3  c  h  m
4  d  i  n
5  e  j  o

我们使用truncate()方法来做一下尝试

df.truncate(before=2, after=4)

output

   A  B  C
2  b  g  l
3  c  h  m
4  d  i  n
我们看到参数beforeafter存在于truncate()方法中,目的就是把行索引2之前和行索引4之后的数据排除在外,筛选出剩余的数据

count()方法

count()方法主要是用来计算某一列当中非空值的个数,示例如下

df = pd.DataFrame({"Name": ["John", "Myla", "Lewis", "John", "John"],
                   "Age": [24., np.nan, 25, 33, 26],
                   "Single": [True, True, np.nan, True, False]})

output

    Name   Age Single
0   John  24.0   True
1   Myla   NaN   True
2  Lewis  25.0    NaN
3   John  33.0   True
4   John  26.0  False

我们使用count()方法来计算一下数据集当中非空值的个数

df.count()

output

Name      5
Age       4
Single    4
dtype: int64

add_prefix()方法/add_suffix()方法

add_prefix()方法和add_suffix()方法分别会给列名以及行索引添加后缀和前缀,对于Series()数据集而言,前缀与后缀是添加在行索引处,而对于DataFrame()数据集而言,前缀与后缀是添加在列索引处,示例如下
s = pd.Series([1, 2, 3, 4])

output

0    1
1    2
2    3
3    4
dtype: int64

我们使用add_prefix()方法与add_suffix()方法在Series()数据集上

s.add_prefix(&#39;row_&#39;)

output

row_0    1
row_1    2
row_2    3
row_3    4
dtype: int64

又例如

s.add_suffix(&#39;_row&#39;)

output

0_row    1
1_row    2
2_row    3
3_row    4
dtype: int64
而对于DataFrame()形式数据集而言,add_prefix()方法以及add_suffix()方法是将前缀与后缀添加在列索引处的
df = pd.DataFrame({&#39;A&#39;: [1, 2, 3, 4], &#39;B&#39;: [3, 4, 5, 6]})

output

   A  B
0  1  3
1  2  4
2  3  5
3  4  6

示例如下

df.add_prefix("column_")

output

   column_A  column_B
0         1         3
1         2         4
2         3         5
3         4         6

又例如

df.add_suffix("_column")

output

   A_column  B_column
0         1         3
1         2         4
2         3         5
3         4         6

clip()方法

clip()方法主要是通过设置阈值来改变数据集当中的数值,当数值超过阈值的时候,就做出相应的调整
data = {&#39;col_0&#39;: [9, -3, 0, -1, 5], &#39;col_1&#39;: [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)

output

df.clip(lower = -4, upper = 4)

output

   col_0  col_1
0      4     -2
1     -3     -4
2      0      4
3     -1      4
4      4     -4
我们看到参数lowerupper分别代表阈值的上限与下限,数据集当中超过上限与下限的值会被替代。

filter()方法

pandas当中的filter()方法是用来筛选出特定范围的数据的,示例如下

df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12])),
                  index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;],
                  columns=[&#39;one&#39;, &#39;two&#39;, &#39;three&#39;])

output

   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9
D   10   11     12

我们使用filter()方法来筛选数据

df.filter(items=[&#39;one&#39;, &#39;three&#39;])

output

   one  three
A    1      3
B    4      6
C    7      9
D   10     12

我们还可以使用正则表达式来筛选数据

df.filter(regex=&#39;e$&#39;, axis=1)

output

   one  three
A    1      3
B    4      6
C    7      9
D   10     12

当然通过参数axis来调整筛选行方向或者是列方向的数据

df.filter(like=&#39;B&#39;, axis=0)

output

   one  two  three
B    4    5      6

first()方法

当数据集当中的行索引是日期的时候,可以通过该方法来筛选前面几行的数据

index_1 = pd.date_range(&#39;2021-11-11&#39;, periods=5, freq=&#39;2D&#39;)
ts = pd.DataFrame({&#39;A&#39;: [1, 2, 3, 4, 5]}, index=index_1)
ts

output

            A
2021-11-11  1
2021-11-13  2
2021-11-15  3
2021-11-17  4
2021-11-19  5

我们使用first()方法来进行一些操作,例如筛选出前面3天的数据

ts.first(&#39;3D&#39;)

output

            A
2021-11-11  1
2021-11-13  2

isin()方法

isin()方法主要是用来确认数据集当中的数值是否被包含在给定的列表当中

df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12])),
                  index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;],
                  columns=[&#39;one&#39;, &#39;two&#39;, &#39;three&#39;])
df.isin([3, 5, 12])

output

     one    two  three
A  False  False   True
B  False   True  False
C  False  False  False
D  False  False   True
若是数值被包含在列表当中了,也就是3、5、12当中,返回的是True,否则就返回False

df.plot.area()方法

下面我们来讲一下如何在Pandas当中通过一行代码来绘制图表,将所有的列都通过面积图的方式来绘制
df = pd.DataFrame({
    &#39;sales&#39;: [30, 20, 38, 95, 106, 65],
    &#39;signups&#39;: [7, 9, 6, 12, 18, 13],
    &#39;visits&#39;: [20, 42, 28, 62, 81, 50],
}, index=pd.date_range(start=&#39;2021/01/01&#39;, end=&#39;2021/07/01&#39;, freq=&#39;M&#39;))

ax = df.plot.area(figsize = (10, 5))

output

4000 words of detailed description, recommending 20 useful Pandas function methods

df.plot.bar()方法

下面我们看一下如何通过一行代码来绘制柱状图

df = pd.DataFrame({&#39;label&#39;:[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], &#39;values&#39;:[10, 30, 50, 70]})
ax = df.plot.bar(x=&#39;label&#39;, y=&#39;values&#39;, rot=20)

output

4000 words of detailed description, recommending 20 useful Pandas function methods

当然我们也可以根据不同的类别来绘制柱状图

age = [0.1, 17.5, 40, 48, 52, 69, 88]
weight = [2, 8, 70, 1.5, 25, 12, 28]
index = [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;, &#39;E&#39;, &#39;F&#39;, &#39;G&#39;]
df = pd.DataFrame({&#39;age&#39;: age, &#39;weight&#39;: weight}, index=index)
ax = df.plot.bar(rot=0)

output

4000 words of detailed description, recommending 20 useful Pandas function methods

当然我们也可以横向来绘制图表

ax = df.plot.barh(rot=0)

output

4000 words of detailed description, recommending 20 useful Pandas function methods

df.plot.box()方法

我们来看一下箱型图的具体的绘制,通过pandas一行代码来实现

data = np.random.randn(25, 3)
df = pd.DataFrame(data, columns=list(&#39;ABC&#39;))
ax = df.plot.box()

output

4000 words of detailed description, recommending 20 useful Pandas function methods

df.plot.pie()方法

接下来是饼图的绘制

df = pd.DataFrame({&#39;mass&#39;: [1.33, 4.87 , 5.97],
                   &#39;radius&#39;: [2439.7, 6051.8, 6378.1]},
                  index=[&#39;Mercury&#39;, &#39;Venus&#39;, &#39;Earth&#39;])
plot = df.plot.pie(y=&#39;mass&#39;, figsize=(8, 8))

output

4000 words of detailed description, recommending 20 useful Pandas function methods

除此之外,还有折线图、直方图、散点图等等,步骤与方式都与上述的技巧有异曲同工之妙,大家感兴趣的可以自己另外去尝试。

The above is the detailed content of 4000 words of detailed description, recommending 20 useful Pandas function methods. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:Python当打之年. If there is any infringement, please contact admin@php.cn delete
详细讲解Python之Seaborn(数据可视化)详细讲解Python之Seaborn(数据可视化)Apr 21, 2022 pm 06:08 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于Seaborn的相关问题,包括了数据可视化处理的散点图、折线图、条形图等等内容,下面一起来看一下,希望对大家有帮助。

详细了解Python进程池与进程锁详细了解Python进程池与进程锁May 10, 2022 pm 06:11 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于进程池与进程锁的相关问题,包括进程池的创建模块,进程池函数等等内容,下面一起来看一下,希望对大家有帮助。

Python自动化实践之筛选简历Python自动化实践之筛选简历Jun 07, 2022 pm 06:59 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于简历筛选的相关问题,包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容,下面一起来看一下,希望对大家有帮助。

归纳总结Python标准库归纳总结Python标准库May 03, 2022 am 09:00 AM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于标准库总结的相关问题,下面一起来看一下,希望对大家有帮助。

Python数据类型详解之字符串、数字Python数据类型详解之字符串、数字Apr 27, 2022 pm 07:27 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于数据类型之字符串、数字的相关问题,下面一起来看一下,希望对大家有帮助。

分享10款高效的VSCode插件,总有一款能够惊艳到你!!分享10款高效的VSCode插件,总有一款能够惊艳到你!!Mar 09, 2021 am 10:15 AM

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件,能够让原本单薄的VS Code如虎添翼,开发效率顿时提升到一个新的阶段。

详细介绍python的numpy模块详细介绍python的numpy模块May 19, 2022 am 11:43 AM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于numpy模块的相关问题,Numpy是Numerical Python extensions的缩写,字面意思是Python数值计算扩展,下面一起来看一下,希望对大家有帮助。

python中文是什么意思python中文是什么意思Jun 24, 2019 pm 02:22 PM

pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间,Guido van Rossum在家闲的没事干,为了跟朋友庆祝圣诞节,决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python,所以便把这门语言叫做python。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment