Home  >  Article  >  Backend Development  >  An introduction to how python can quickly find the difference between data in two spreadsheets

An introduction to how python can quickly find the difference between data in two spreadsheets

黄舟
黄舟Original
2017-05-28 11:15:212074browse

The following editor will bring you an articlepythonHow to quickly find the difference in data in two spreadsheets. The editor thinks it’s pretty good, so I’ll share it with you now and give it as a reference. Let’s follow the editor to take a look

I have just come into contact with python recently. I am looking for some small tasks to practice my skills. I hope that I can continue to exercise my problem-solving ability in practice.

There will be such a scenario in the company: there is an electronic form content used by two, three or more departments, and these employees will maintain these forms from time to time. After updating some of the data of your own department, over time, everyone's data will start to fight, which is very detrimental to management. How to quickly find the differences in data in two or more spreadsheets?

Solution:

1. Excel’s own method (if interested, use Baidu by yourself)

2. Write a small script in python

#!/usr/bin/env python
# -*- coding: utf-8 -*-

#导入模块 openpyxl 
import openpyxl
from openpyxl.styles import PatternFill
from openpyxl.styles import colors
from openpyxl.styles import Font, Color

#读取excel文件
#括号中的字符串为你要比较的两个excel的路径,注意用“/”
wb_a = openpyxl.load_workbook('d:/BAKFILE/d046532/Desktop/check excel/test.xlsx')
wb_b = openpyxl.load_workbook('d:/BAKFILE/d046532/Desktop/check excel/test2.xlsx')
#定义一个方法来获取表格中某一列的内容,返回一个列表
#在这里,我的表格中:IP是具有唯一性的,所以我用它来区分数据的差异,而IP这一列在我的表格中是第“G”列
def getIP(wb):
  sheet = wb.get_active_sheet()
  ip = []
  for cellobj in sheet['G']:
    ip.append(cellobj.value)

  return ip
#获得ip列表
ip_a = getIP(wb_a)
ip_b = getIP(wb_b)
#将两个列表转换成集合
aa = set(ip_a)
bb = set(ip_b)
#找出两个列表的不同行,并转换成列表
difference = list(aa ^ bb)
#打印出列表中的元素
#到这一步,两个表格中不同的数据已经被找出来了
for i in difference:
  print (i)

#将不同行高亮显示
print ("开始第一张表" + "----" *10)
a = wb_a.get_active_sheet()['G']
for cellobj in a:
  if cellobj.value in difference:
    print (cellobj.value)
    cellobj.font = Font(color=colors.BLACK, italic=True ,bold = True)
    cellobj.fill = PatternFill("solid", fgColor="DDDDDD")
print ("开始第二张表" + "----" *10)
b = wb_b.get_active_sheet()['G']
for cellobj in b:
  if cellobj.value in difference:
    print (cellobj.value)
    cellobj.font = Font(color=colors.BLACK, italic=True ,bold = True)
    cellobj.fill = PatternFill("solid", fgColor="DDDDDD")

wb_a.save('d:/BAKFILE/d046532/Desktop/a.xlsx')
wb_b.save('d:/BAKFILE/d046532/Desktop/b.xlsx')

In this way, two copies of excel will be saved, and in this copy Use cell fill color and font color to mark the different data differences in the two tables

Unresolved:

1. How to add these different data into a table to form a complete table

2. How to optimize and streamline the code

The above is the detailed content of An introduction to how python can quickly find the difference between data in two spreadsheets. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn