


Hello everyone! I am Brother Tiger.
Project Background
As data analysts, we need to often make statistical analysis charts. But when there are too many reports, it often takes us most of our time to create them. This delayed us from using a lot of time to conduct data analysis. But as data analysts, we should try our best to dig out the relevant information hidden behind the data in tables and charts, instead of simply making statistical tables and charts and then sending reports.
1. Purpose of report automation
1. Save time and improve efficiency
Automation can always save time and improve our work efficiency. Let our programming reduce the coupling of each function implementation code as much as possible and better maintain the code. This will save us a lot of time and free us up to do more valuable and meaningful work.
2. Reduce errors
If the coding effect is correct, it can be used forever. If it is done manually, some mistakes may be made. It is more reassuring to leave it to a fixed program. When the requirements change, only part of the code can be modified to solve the problem.
2. Scope of report automation
First of all, we need to formulate the reports we need according to business needs. Not every report needs to be automated. Some complex secondary development indicator data It is relatively complicated to realize automated programming, and various bugs may be hidden. Therefore, we need to summarize the characteristics of the reports we use in our work. The following are several aspects that we need to comprehensively consider:
1. Frequency
It is often used in some businesses Tables that we may want to include in the scope of automated procedures. For example, customer information list, sales flow report, business loss report, month-on-month and year-on-year reports, etc.
It is necessary to automate these frequently used reports. For those reports that need to be used occasionally, or for secondary development indicators, or reports that need to copy statistics, there is no need to automate these reports.
2. Development time
This is equivalent to cost and interest rate. If it is difficult to automate some reports and exceeds the time required for our ordinary statistical analysis, there is no need to automate it. . Therefore, when starting automation work, you need to measure whether the time spent developing scripts or the time spent manually making tables is shorter. Of course, I will provide a set of implementation solutions, but only for some commonly used and simple reports.
3. Process
For each process and step of our report, each company is different. We need to code according to the business scenario to implement the functions of each step. Therefore, the process we create should be consistent with business logic, and the program we create should also be logical.
3. Implementation steps
First we need to know what indicators we need:
Indicators
- Overall overview indicator
Reflects the overall size of a certain data indicator
- Comparative indicator
- Monogram
Adjacent Direct difference between indicators within a time period
- YoY
Comparison of indicators at a common time point within adjacent time periods
- Central tendency indicator
- Median
- Mode
- Mean/weighted average
- Dispersion indicator
- Standard deviation
- Variance
- Quartiles
- Full range (range)
- Maximum bound minus minimum bound
- Correlation index
- r
We take a simple report to simulate:
Step 1: Read the data source file
First we need to understand where our data comes from, that is, the data source. Our final data processing is converted into DataFrame for analysis, so the data source needs to be converted into DataFrame form:
import pandas as pd import json import pymysql from sqlalchemy import create_engine # 打开数据库连接 conn = pymysql.connect(host='localhost', port=3306, user='root', passwd='xxxx', charset = 'utf8' ) engine=create_engine('mysql+pymysql://root:xxxx@localhost/mysql?charset=utf8') def read_excel(file): df_excel=pd.read_excel(file) return df_excel def read_json(file): with open(file,'r')as json_f: df_json=pd.read_json(json_f) return df_json def read_sql(table): sql_cmd ='SELECT * FROM %s'%table df_sql=pd.read_sql(sql_cmd,engine) return df_sql def read_csv(file): df_csv=pd.read_csv(file) return df_csv
The above codes can be used normally after passing the test, but the read function of pandas targets different For file reading in the form, the read function parameters also have different meanings and need to be adjusted directly according to the form of the table.
Other read functions will be added after the article is written. Except that read_sql needs to connect to the database, the others are relatively simple.
Second step: DataFrame calculation
We take user information as an example:
我们需要统计的指标为:
- #指标说明
- 单表图:
- 前十个产品受众最多的地区
#将城市空值的一行删除 df=df[df['city_num'].notna()] #删除error df=df.drop(df[df['city_num']=='error'].index) #统计df = df.city_num.value_counts()
我们仅获取前10名的城市就好了,封装为饼图:
def pie_chart(df): #将城市空值的一行删除 df=df[df['city_num'].notna()] #删除error df=df.drop(df[df['city_num']=='error'].index) #统计 df = df.city_num.value_counts() df.head(10).plot.pie(subplots=True,figsize=(5, 6),autopct='%.2f%%',radius = 1.2,startangle = 250,legend=False) pie_chart(read_csv('user_info.csv'))
将图表保存起来:
plt.savefig('fig_cat.png')
要是你觉得matplotlib的图片不太美观的话,你也可以换成echarts的图片,会更加好看一些:
pie = Pie() pie.add("",words) pie.set_global_opts(title_opts=opts.TitleOpts(title="前十地区")) #pie.set_series_opts(label_opts=opts.LabelOpts(user_df)) pie.render_notebook()
封装后就可以直接使用了:
def echart_pie(user_df): user_df=user_df[user_df['city_num'].notna()] user_df=user_df.drop(user_df[user_df['city_num']=='error'].index) user_df = user_df.city_num.value_counts() name=user_df.head(10).index.tolist() value=user_df.head(10).values.tolist() words=list(zip(list(name),list(value))) pie = Pie() pie.add("",words) pie.set_global_opts(title_opts=opts.TitleOpts(title="前十地区")) #pie.set_series_opts(label_opts=opts.LabelOpts(user_df)) return pie.render_notebook() user_df=read_csv('user_info.csv') echart_pie(user_df)
可以进行保存,可惜不是动图:
from snapshot_selenium import snapshot make_snapshot(snapshot,echart_pie(user_df).render(),"test.png")
保存为网页的形式就可以自动加载JS进行渲染了:
echart_pie(user_df).render('problem.html') os.system('problem.html')
第三步:自动发送邮件
做出来的一系列报表一般都要发给别人看的,对于一些每天需要发送到指定邮箱或者需要发送多封报表的可以使用Python来自动发送邮箱。
在Python发送邮件主要借助到smtplib和email这个两个模块。
- smtplib:主要用来建立和断开与服务器连接的工作。
- email:主要用来设置一些些与邮件本身相关的内容。
不同种类的邮箱服务器连接地址不一样,大家根据自己平常使用的邮箱设置相应的服务器进行连接。这里博主用网易邮箱展示:
首先需要开启POP3/SMTP/IMAP服务:
之后便可以根据授权码使用python登入了。
import smtplib from email import encoders from email.header import Header from email.utils import parseaddr,formataddr from email.mime.application import MIMEApplication from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText #发件人邮箱 asender="fanstuck@163.com" #收件人邮箱 areceiver="1079944650@qq.com" #抄送人邮箱 acc="fanstuck@163.com" #邮箱主题 asubject="谢谢关注" #发件人地址 from_addr="fanstuck@163.com" #邮箱授权码 password="####" #邮件设置 msg=MIMEMultipart() msg['Subject']=asubject msg['to']=areceiver msg['Cc']=acc msg['from']="fanstuck" #邮件正文 body="你好,欢迎关注fanstuck,您的关注就是我继续创作的动力!" msg.attach(MIMEText(body,'plain','utf-8')) #添加附件 htmlFile = 'C:/Users/10799/problem.html' html = MIMEApplication(open(htmlFile , 'rb').read()) html.add_header('Content-Disposition', 'attachment', filename='html') msg.attach(html) #设置邮箱服务器地址和接口 smtp_server="smtp.163.com" server = smtplib.SMTP(smtp_server,25) server.set_debuglevel(1) #登录邮箱 server.login(from_addr,password) #发生邮箱 server.sendmail(from_addr,areceiver.split(',')+acc.split(','),msg.as_string()) #断开服务器连接 server.quit()
运行测试:
下载文件:
完全没问题!!!
The above is the detailed content of Python automated office applet: realize report automation and automatically send it to the destination mailbox. For more information, please follow other related articles on the PHP Chinese website!

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于Seaborn的相关问题,包括了数据可视化处理的散点图、折线图、条形图等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于进程池与进程锁的相关问题,包括进程池的创建模块,进程池函数等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于简历筛选的相关问题,包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于数据类型之字符串、数字的相关问题,下面一起来看一下,希望对大家有帮助。

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件,能够让原本单薄的VS Code如虎添翼,开发效率顿时提升到一个新的阶段。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于numpy模块的相关问题,Numpy是Numerical Python extensions的缩写,字面意思是Python数值计算扩展,下面一起来看一下,希望对大家有帮助。

pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间,Guido van Rossum在家闲的没事干,为了跟朋友庆祝圣诞节,决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python,所以便把这门语言叫做python。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 Linux new version
SublimeText3 Linux latest version

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

WebStorm Mac version
Useful JavaScript development tools

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft
