%group_by(cyl)%>%summarise(meanmeanOfdisp=mean(disp))%>%ggplot(aes(x=as.factor(cyl),y=meanOfdisp,fill=as"/> %group_by(cyl)%>%summarise(meanmeanOfdisp=mean(disp))%>%ggplot(aes(x=as.factor(cyl),y=meanOfdisp,fill=as">
Home >Backend Development >Python Tutorial >How to implement chained calls in Python
Why is it a chain call?
Chained calls, or also called method chaining, literally mean a code method that strings together a series of operations or function methods like a chain.
I first realized the "beauty" of chain calls, and I started by using the pipeline operator of R language.
library(tidyverse) mtcars %>% group_by(cyl) %>% summarise(meanmeanOfdisp = mean(disp)) %>% ggplot(aes(x=as.factor(cyl), y=meanOfdisp, fill=as.factor(seq(1,3))))+ geom_bar(stat = 'identity') + guides(fill=F)
For R users, this piece of code can quickly understand what the entire process steps are. It all starts with the symbol %>% (the pipe operator).
Through the pipe operator, we can pass the thing on the left to the next thing. Here I pass the mtcars data set to the group_by function, then pass the result to the summarize function, and finally pass it to the ggplot function for visual drawing.
If I had not learned chain calling, then I would have written like this when I first learned the R language:
library(tidyverse) cyl4 <p>If I did not use the pipe operator, then I would make unnecessary assignments , and overwrite the original data object, but in fact, the cyl# and data generated in it are actually only used to serve the graph picture, so the problem is that the code will become redundant. </p><p>Chain calls not only greatly simplify the code, but also improve the readability of the code, allowing you to quickly understand what each step is doing. This method is very useful for data analysis or data processing. It can reduce the creation of unnecessary variables and enable exploration in a fast and simple way. </p><p>You can see chain calls or pipeline operations in many places. Here I will give two typical examples besides R language. </p><p>One is the Shell statement: </p><pre class="brush:php;toolbar:false">echo "`seq 1 100`" | grep -e "^[3-4].*" | tr "3" "*"
Using the "|" pipe operator in the shell statement can quickly implement chain calls. Here I first print all integers from 1-100, and then Pass it into the grep method, extract all parts starting with 3 or 4, pass this part into the tr method, and replace the part of the number containing 3 with asterisks. The results are as follows:
The other is the Scala language:
object Test { def main(args: Array[String]): Unit = { val numOfseq = (1 to 100).toList val chain = numOfseq.filter(_%2==0) .map(_*2) .take(10) } }
In this example, first the numOfseq variable contains all integers from 1-100, and then starting from the chain part, I first The filter method is called on the basis of numOfseq to filter the even numbers among these numbers. Then the map method is called to multiply the filtered numbers by 2. Finally, the take method is used to take out the first 10 numbers from the newly formed numbers. numbers, these numbers are jointly assigned to the chain variable.
Through the above description, I believe you can have a preliminary impression of chain calls, but once you master chain calls, in addition to changing your coding style, your programming thinking There will also be different improvements.
Chained calls in Python
To implement a simple chained call in Python is to build a class method and return the object itself or return the belonging class (@classmethod )
class Chain: def __init__(self, name): self.name = name def introduce(self): print("hello, my name is %s" % self.name) return self def talk(self): print("Can we make a friend?") return self def greet(self): print("Hey! How are you?") return self if __name__ == '__main__': chain = Chain(name = "jobs") chain.introduce() print("-"*20) chain.introduce().talk() print("-"*20) chain.introduce().talk().greet()
Here we create a Chain class and need to pass a name string parameter to create an instance object; there are three methods in this class, namely introduce, talk and greet.
Since self is returned each time, we can continuously call methods in the object's belonging class. The results are as follows:
hello, my name is jobs -------------------- hello, my name is jobs Can we make a friend? -------------------- hello, my name is jobs Can we make a friend? Hey! How are you?
Use chain calls in Pandas
After paving the way so much, we finally talk about the chain call part of Pandas
Most methods in Pandas are suitable for operation using chain methods because they are processed by the API. What is returned is often Series type or DataFrame type, so we can directly call the corresponding method. Here I take the Huanong Brothers B station video data that I crawled when I was doing a case demonstration for others around February this year as an example. It can be obtained through the link.
The data field information is as follows, there are 300 pieces of data and 20 fields:
Field information
But in use Before this part of the data, we still need to do a preliminary cleaning of this part of the data. Here I mainly selected the following fields:
aid: the AV number corresponding to the video
comment: number of comments
play: number of views
title: title
video_review: number of comments
created: upload date
length: video duration
1. Data cleaning
The corresponding values of each field are as follows:
Field values
We can see from the data:
The title field will be preceded by the four words "Huanong Brothers". If the number of words in the title is counted, it needs to be removed in advance;
created The upload date seems to be displayed as a long series of values, but it is actually a timestamp from 1970 to the present. We need to process it into a readable year, month and day format;
length 播放量长度只显示了分秒,但是小时并未用「00」来进行补全,因此这里我们一方面需要将其补全,另一方面要将其转换成对应的时间格式
链式调用操作如下:
import re import pandas as pd # 定义字数统计函数 def word_count(text): return len(re.findall(r"[\u4e00-\u9fa5]", text)) tidy_data = ( pd.read_csv('~/Desktop/huanong.csv') .loc[:, ['aid', 'title', 'created', 'length', 'play', 'comment', 'video_review']] .assign(title = lambda df: df['title'].str.replace("华农兄弟:", ""), title_count = lambda df: df['title'].apply(word_count), created = lambda df: df['created'].pipe(pd.to_datetime, unit='s'), created_date = lambda df: df['created'].dt.date, length = lambda df: "00:" + df['length'], video_length = lambda df: df['length'].pipe(pd.to_timedelta).dt.seconds ) )
这里首先是通过loc方法挑出其中的列,然后调用assign方法来创建新的字段,新的字段其字段名如果和原来的字段相一致,那么就会进行覆盖,从assign中我们可以很清楚地看到当中字段的产生过程,同lambda 表达式进行交互:
1.title 和title_count:
原有的title字段因为属于字符串类型,可以直接很方便的调用str.* 方法来进行处理,这里我就直接调用当中的replace方法将「华农兄弟:」字符进行清洗
基于清洗好的title 字段,再对该字段使用apply方法,该方法传递我们前面实现定义好的字数统计的函数,对每一条记录的标题中,对属于\u4e00到\u9fa5这一区间内的所有 Unicode 中文字符进行提取,并进行长度计算
2.created和created_date:
对原有的created 字段调用一个pipe方法,该方法会将created 字段传递进pd.to_datetime 参数中,这里需要将unit时间单位设置成s秒才能显示出正确的时间,否则仍以 Unix 时间错的样式显示
基于处理好的created 字段,我们可以通过其属于datetime64 的性质来获取其对应的时间,这里 Pandas 给我们提供了一个很方便的 API 方法,通过dt.*来拿到当中的属性值
3.length 和video_length:
原有的length 字段我们直接让字符串00:和该字段进行直接拼接,用以做下一步转换
基于完整的length时间字符串,我们再次调用pipe方法将该字段作为参数隐式传递到pd.to_timedelta方法中转化,然后同理和create_date字段一样获取到相应的属性值,这里我取的是秒数。
2、播放量趋势图
基于前面稍作清洗后得到的tidy_data数据,我们可以快速地做一个播放量走势的探索。这里我们需要用到created这个属于datetime64的字段为 X 轴,播放量play 字段为 Y 轴做可视化展示。
# 播放量走势 %matplotlib inline %config InlineBackend.figure_format = 'retina' import matplotlib.pyplot as plt (tidy_data[['created', 'play']] .set_index('created') .resample('1M') .sum() .plot( kind='line', figsize=(16, 8), title='Video Play Prend(2018-2020)', grid=True, legend=False ) ) plt.xlabel("") plt.ylabel('The Number Of Playing')
这里我们将上传日期和播放量两个选出来后,需要先将created设定为索引,才能接着使用resample重采样的方法进行聚合操作,这里我们以月为统计颗粒度,对每个月播放量进行加总,之后再调用plot 接口实现可视化。
链式调用的一个小技巧就是,可以利用括号作用域连续的特性使整个链式调用的操作不会报错,当然如果不喜欢这种方式也可以手动在每条操作后面追加一个\符号,所以上面的整个操作就会变成这样:
tidy_data[['created', 'play']] \ .set_index('created') \ .resample('1M') .sum() .plot( \ kind='line', \ figsize=(16, 8), \ title='Video Play Prend(2018-2020)', \ grid=True, \ legend=False \ )
但是相比于追加一对括号来说,这种尾部追加\符号的方式并不推荐,也不优雅。
但是如果既没有在括号作用域或未追加\ 符号,那么在运行时 Python 解释器就会报错。
3、链式调用性能
通过前两个案例我们可以看出链式调用可以说是比较优雅且快速地能实现一套数据操作的流程,但是链式调用也会因为不同的写法而存在性能上的差异。
这里我们继续基于前面的tidy_data操作,这里我们基于created_date 来对play、comment和video_review进行求和后的数值进一步以 10 为底作对数化。最后需要得到以下结果:
统计表格
写法一:一般写法
一般写法
这种写法就是基于tidy_data拷贝后进行操作,操作得到的结果会不断地覆盖原有的数据对象
写法二:链式调用写法
链式调用写法
可以看到,链式调用的写法相比于一般写法而言会快上一点,不过由于数据量比较小,因此二者时间的差异并不大;但链式调用由于不需要额外的中间变量已经覆盖写入步骤,在内存开销上会少一些。
结尾:链式调用的优劣
从本文的只言片语中,你能领略到链式调用使得代码在可读性上大大的增强,同时以尽肯能少的代码量去实现更多操作。
当然,链式调用并不算是完美的,它也存在着一定缺陷。比如说当链式调用的方法超过 10 步以上时,那么出错的几率就会大幅度提高,从而造成调试或 Debug 的困难。比如这样:
(data .method1(...) .method2(...) .method3(...) .method4(...) .method5(...) .method6(...) .method7(...) # Something Error .method8(...) .method9(...) .method10(...) .method11(...) )
The above is the detailed content of How to implement chained calls in Python. For more information, please follow other related articles on the PHP Chinese website!