search
HomeBackend DevelopmentPython Tutorial14 common operations in Excel using Python

14 common operations in Excel using Python

Hello everyone, I am a rookie!

The data is the sales data found online. It looks like this:

14 common operations in Excel using Python

1. Association formula: Vlookup

vlookup is almost the most commonly used formula in Excel, and is generally used for related queries between two tables. So I first divided this table into two tables.

df1=sale[['订单明细号','单据日期','地区名称', '业务员名称','客户分类', '存货编码', '客户名称', '业务员编码', '存货名称', '订单号',
'客户编码', '部门名称', '部门编码']]
df2=sale[['订单明细号','存货分类', '税费', '不含税金额', '订单金额', '利润', '单价','数量']]

Demand: I want to know the profit corresponding to each order of df1.

The profit column exists in the table of df2, so I want to know the profit corresponding to each order of df1. When using excel, first confirm that the order detail number is a unique value, then add a new column in df1 and write: =vlookup(a2,df2!a:h,6,0), and then pull it down and it will be OK. (I won’t write excel for the remaining 13)

How to implement it using python?

#查看订单明细号是否重复,结果是没。
df1["订单明细号"].duplicated().value_counts()
df2["订单明细号"].duplicated().value_counts()
df_c=pd.merge(df1,df2,on="订单明细号",how="left")

2. Pivot table

Requirement: I want to know the total profit and average profit earned by salesmen in each region.

pd.pivot_table(sale,index="地区名称",columns="业务员名称",values="利润",aggfunc=[np.sum,np.mean])

3. Compare the differences between the two columns

Because the data dimensions in each column of this table are different, comparison is meaningless, so I first made an order detail The differences in numbers are then compared.

Requirement: Compare the difference between the order detail number and order detail number 2 and display it.

sale["订单明细号2"]=sale["订单明细号"]
#在订单明细号2里前10个都+1.
sale["订单明细号2"][1:10]=sale["订单明细号2"][1:10]+1
#差异输出
result=sale.loc[sale["订单明细号"].isin(sale["订单明细号2"])==False]

4. Remove duplicate values

Requirement: Remove duplicate values ​​coded by the salesperson

sale.drop_duplicates("业务员编码",inplace=True)

5. Missing value processing

First check which columns of sales data have missing values.

#列的行数小于index的行数的说明有缺失值,这里客户名称329<335,说明有缺失值
sale.info()

14 common operations in Excel using Python

Requirements: fill in missing values ​​with 0 or delete rows with missing values ​​in customer codes. In fact, the method of processing missing values ​​is very complicated. Here we only introduce simple processing methods. If it is a numerical variable, the most commonly used method is the average, median or mode. For more complex ones, the random forest model can be used to predict based on other dimensions. The result is populated. If it is a categorical variable, it is more accurate to fill it in based on business logic. For example, the requirement here is to fill in the missing value of the customer name: it can be filled according to the customer name corresponding to the inventory with the highest frequency of occurrence in the inventory classification.

Here we use a simple solution: fill the missing values ​​with 0 or delete the rows with missing values ​​in the customer code.

#用0填充缺失值
sale["客户名称"]=sale["客户名称"].fillna(0)
#删除有客户编码缺失值的行
sale.dropna(subset=["客户编码"])

6. Multi-condition filtering

Demand: I would like to know the information about salesman Zhang Ai, who sells goods in the Beijing area with an order amount greater than 6,000.

sale.loc[(sale["地区名称"]=="北京")&(sale["业务员名称"]=="张爱")&(sale["订单金额"]>5000)]

7. Fuzzy filtering data

Requirement: Filter information whose inventory name contains "Samsung" or "Sony".

sale.loc[sale["存货名称"].str.contains("三星|索尼")]

8. Classification and summary

Demand: The total profit of each salesperson in the Beijing area.

sale.groupby(["地区名称","业务员名称"])["利润"].sum()

9. Condition calculation

Demand: How many orders have the inventory name containing "Samsung" and the tax is higher than 1,000? What is the sum and average profit of these orders? (Or minimum value, maximum value, quartile, label difference)

sale.loc[sale["存货名称"].str.contains("三星")&(sale["税费"]>=1000)][["订单明细号","利润"]].describe()

14 common operations in Excel using Python

10. Delete spaces between data

Requirement: Delete the spaces on both sides of the inventory name.

sale["Inventory name"].map(lambda s:s.strip(""))

11. Data separation

14 common operations in Excel using Python

Requirement: Separate date and time.

sale=pd.merge(sale,pd.DataFrame(sale["单据日期"].str.split(" ",expand=True)),how="inner",left_index=True,right_index=True)

12. Replacement of outliers

First, use the describe() function to briefly check whether there are any outliers in the data.

#You can see that the output tax has a negative number. This is generally not the case and is regarded as an outlier.

sale.describe()

14 common operations in Excel using Python

Requirement: Use 0 to replace outliers.

sale["订单金额"]=sale["订单金额"].replace(min(sale["订单金额"]),0)

13. Grouping

Requirements: Group regions according to profit data distribution: "Poor", "Medium", "Better", "Very Good" Okay"

First of all, of course, we need to look at the data distribution of profits. Here we use quartiles to judge.

sale.groupby("地区名称")["利润"].sum().describe()

14 common operations in Excel using Python

Group the regional total profit in the [-9,7091] interval as "poor" according to the quartile, (7091,10952] interval The grouping is "moderate" (10952,17656] is grouped as good, (17656,37556] is grouped as very good.

#先建立一个Dataframe
sale_area=pd.DataFrame(sale.groupby("地区名称")["利润"].sum()).reset_index()
#设置bins,和分组名称
bins=[-10,7091,10952,17656,37556]
groups=["较差","中等","较好","非常好"]
#使用cut分组
#sale_area["分组"]=pd.cut(sale_area["利润"],bins,labels=groups)

14. Define tags according to business logic

Demand: Product information with a sales profit margin (i.e. profit/order amount) greater than 30% and marking it as a high-quality product, and less than 5% as an ordinary product.

sale.loc[(sale["利润"]/sale["订单金额"])>0.3,"label"]="优质商品"
sale.loc[(sale["利润"]/sale["订单金额"])<0.05,"label"]="一般商品"

In fact, there are many commonly used operations in excel, I just I have listed 14 commonly used ones. If you want to implement any other operations, you can comment and discuss them together. In addition, I also know that my writing of Python is not streamlined enough, so I use loc inertly. (In fact, the query will be more streamlined). If you are familiar with these few If you have a better way to write the operation, please be sure to comment and let me know, thank you!

Finally, I would like to say that I think it is best not to compare excel and python to study which one is easier to use. In fact, they are both tools. As the most widespread data processing tool, excel has been monopolized for so many years and must be quite convenient in data processing. Excellent, some operations are indeed easier in python, but there are also many operations in excel that are easier than python.

For example, a very simple operation: sum each column and display it on the bottom line. Excel just adds a sum() function to a column, and then pulls it to the left to solve it, while python Then you need to define a function (because python needs to determine the format and will directly report an error if it is not a numeric data.)

The above is the detailed content of 14 common operations in Excel using Python. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
详细讲解Python之Seaborn(数据可视化)详细讲解Python之Seaborn(数据可视化)Apr 21, 2022 pm 06:08 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于Seaborn的相关问题,包括了数据可视化处理的散点图、折线图、条形图等等内容,下面一起来看一下,希望对大家有帮助。

详细了解Python进程池与进程锁详细了解Python进程池与进程锁May 10, 2022 pm 06:11 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于进程池与进程锁的相关问题,包括进程池的创建模块,进程池函数等等内容,下面一起来看一下,希望对大家有帮助。

Python自动化实践之筛选简历Python自动化实践之筛选简历Jun 07, 2022 pm 06:59 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于简历筛选的相关问题,包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容,下面一起来看一下,希望对大家有帮助。

归纳总结Python标准库归纳总结Python标准库May 03, 2022 am 09:00 AM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于标准库总结的相关问题,下面一起来看一下,希望对大家有帮助。

Python数据类型详解之字符串、数字Python数据类型详解之字符串、数字Apr 27, 2022 pm 07:27 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于数据类型之字符串、数字的相关问题,下面一起来看一下,希望对大家有帮助。

分享10款高效的VSCode插件,总有一款能够惊艳到你!!分享10款高效的VSCode插件,总有一款能够惊艳到你!!Mar 09, 2021 am 10:15 AM

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件,能够让原本单薄的VS Code如虎添翼,开发效率顿时提升到一个新的阶段。

详细介绍python的numpy模块详细介绍python的numpy模块May 19, 2022 am 11:43 AM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于numpy模块的相关问题,Numpy是Numerical Python extensions的缩写,字面意思是Python数值计算扩展,下面一起来看一下,希望对大家有帮助。

python中文是什么意思python中文是什么意思Jun 24, 2019 pm 02:22 PM

pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间,Guido van Rossum在家闲的没事干,为了跟朋友庆祝圣诞节,决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python,所以便把这门语言叫做python。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function