How does Python process Excel files?-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

How does Python process Excel files?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 08, 2023 pm 05:58 PM

excelpython

『Problem Description』

The excel to be processed this time has two sheets, and the value of the other sheet needs to be calculated based on the data of one sheet. The problem is that the sheet to be calculated contains not only numerical values, but also formulas. Let’s take a look:

How does Python process Excel files?

As shown in the picture above, this excel has a total of two sheets: CP and DS. We must follow certain business rules and based on the data in CP Calculate the data of the corresponding cell of DS. The blue boxes in the picture contain formulas, while other areas contain numerical values.

Let's take a look, if we follow the processing logic mentioned before, read excel into the dataframe in batches at one time, and then write it back in batches at once, what are the problems? This part of the code is as follows:

import pandas as pd
import xlwings as xw
 
#要处理的文件路径
fpath = "data/DS_format.xlsm"
 
#把CP和DS两个sheet的数据分别读入pandas的dataframe
cp_df = pd.read_excel(fpath,sheet_name="CP",header=[0])
ds_df = pd.read_excel(fpath,sheet_name="DS",header=[0,1])
 
#计算过程省略......
 
#保存结果到excel       
app = xw.App(visible=False,add_book=False)
ds_format_workbook = app.books.open(fpath)
ds_worksheet = ds_format_workbook.sheets["DS"]
ds_worksheet.range("A1").expand().options(index=False).value = ds_df 
ds_format_workbook.save()
ds_format_workbook.close()
app.quit()

The problem with the above code is that when the pd.read_excel() method reads data from excel to the dataframe, for cells with formulas, the formulas are directly read and calculated. The result (if there is no result, Nan is returned), and when we write to excel, we directly write the dataframe back in batches at once, so that the cells with formulas before are written back with the calculated value or Nan, and are discarded. formula.

Okay, a problem has arisen, how should we solve it? Two ideas come to mind here:

When dataframe is written back to excel, do not write it back in batches at once, but write back only the calculated data through iteration of rows and columns. The cells with formulas do not move;
When reading excel, is there a way to read formulas for cells with formulas instead of reading the results of formula calculations?

I did try the above two ideas respectively. Let’s take a look.

「Option 1」

The following code attempts to traverse the dataframe and then write the corresponding value according to the cell. The cells with formulas will not move

#根据ds_df来写excel，只写该写的单元格
for row_idx,row in ds_df.iterrows():
    total_capabity_val = row[(&#39;Total&#39;,&#39;Capabity&#39;)].strip()
    total_capabity1_val = row[(&#39;Total&#39;,&#39;Capabity.1&#39;)].strip()
    #Total和1Gb  Eqv.所在的行不写
    if total_capabity_val!= &#39;Total&#39; and total_capabity_val != &#39;1Gb  Eqv.&#39;:
        #给Delta和LOI赋值
        if total_capabity1_val == &#39;LOI&#39; or total_capabity1_val == &#39;Delta&#39;:
            ds_worksheet.range((row_idx + 3 ,3)).value = row[(&#39;Current week&#39;,&#39;BOH&#39;)]
            print(f"ds_sheet的第{row_idx + 3}行第3列被设置为{row[(&#39;Current week&#39;,&#39;BOH&#39;)]}") 
        #给Demand和Supply赋值
        if total_capabity1_val == &#39;Demand&#39; or total_capabity1_val == &#39;Supply&#39;:
            cp_datetime_columns = cp_df.columns[53:]
            for col_idx in range(4,len(ds_df.columns)):
                ds_datetime = ds_df.columns.get_level_values(1)[col_idx]
                ds_month = ds_df.columns.get_level_values(0)[col_idx]
                if type(ds_datetime) == str and ds_datetime != &#39;TTL&#39; and ds_datetime != &#39;Total&#39; and (ds_datetime in cp_datetime_columns):
                    ds_worksheet.range((row_idx + 3,col_idx + 1)).value = row[(f&#39;{ds_month}&#39;,f&#39;{ds_datetime}&#39;)]
                    print(f"ds_sheet的第{row_idx + 3}行第{col_idx + 1}列被设置为{row[(f&#39;{ds_month}&#39;,f&#39;{ds_datetime}&#39;)]}") 
                elif type(ds_datetime) == datetime.datetime and (ds_datetime in cp_datetime_columns):
                    ds_worksheet.range((row_idx + 3,col_idx + 1)).value = row[(f&#39;{ds_month}&#39;,ds_datetime)]     
                    print(f"ds_sheet的第{row_idx + 3}行第{col_idx + 1}列被设置为{row[(f&#39;{ds_month}&#39;,ds_datetime)]}")

The above code does solve the problem The problem is solved, that is, the formulas of cells with formulas are retained. However, according to the advice on Python processing excel mentioned at the beginning of our article, this code has serious performance problems, because it frequently operates excel cells through the API, resulting in very slow writing. It runs on my old Mac. It took 40 minutes, which was simply unacceptable, so the plan had to be abandoned.

『Plan 2』

This plan hopes to retain the formula value when reading cells with formula values in excel. This can only be found from the API of each Python excel library to see if there is a corresponding method. I looked carefully at the read_excel() method of Pandas and there is no corresponding parameter support. I found an API that can support Openpyxl, as follows:

import openpyxl
ds_format_workbook = openpyxl.load_workbook(fpath,data_only=False)
ds_wooksheet = ds_format_workbook[&#39;DS&#39;]
ds_df =  pd.DataFrame(ds_wooksheet.values)

The key is the data_only parameter here. If it is True, the data will be returned. If it is False, the formula value can be retained.

I thought I found it I was overjoyed to find the corresponding solution, but when I saw the data structure in the dataframe read through openpyxl, I was shocked. Because the header of my excel table is a relatively complex two-level header, and there are situations where cells are merged and split in the header. After such a header is read into the dataframe by openpyxl, it does not follow the multi-level header of pandas. The index is processed, but it is simply processed into a numeric index 0123...

But my calculation of the dataframe will rely on multi-level indexes, so this processing method of openpyxl makes my subsequent calculations unable to process.

openpyxl doesn’t work, what about xlwings? After searching through the xlwings API documentation, I actually found it, as shown below:

How does Python process Excel files?

Range class provides a Property called formula, which can get and set formula.

When I saw this, I felt like I had found a treasure, so I quickly started practicing the code. Perhaps out of inertia, or maybe I was frightened by the efficiency of operating Excel by row, column, and cell in the past, the first solution I thought of was to do it in batches at once, that is, to read all the formulas in Excel at once, and then write them all at once. Go back, so my initial code was like this:

#使用xlwings来读取formula
app = xw.App(visible=False,add_book=False)
ds_format_workbook = app.books.open(fpath)
ds_worksheet = ds_format_workbook.sheets["DS"]
#先把所有公式一次性读取并保存下来
formulas = ds_worksheet.used_range.formula
 
#中间计算过程省略...
 
#一次性把所有公式写回去
ds_worksheet.used_range.formula = formulas

But I thought wrongly, ds_worksheet.used_range.formula made me misunderstand that the formula will only return cells with formulas in excel, but in fact it All cells are returned, only formulas are retained for cells with formulas. So, when I rewrite the formula back, it will overwrite the other values I calculated through the dataframe and written to excel.

In this case, I can only process the cells with formulas separately instead of all at once, so the code has to be written like this:

#使用xlwings来读取formula
app = xw.App(visible=False,add_book=False)
ds_format_workbook = app.books.open(fpath)
ds_worksheet = ds_format_workbook.sheets["DS"]
 
#保留excel中的formula
#找到DS中Total所在的行，Total之后的行都是formula
row = ds_df.loc[ds_df[(&#39;Total&#39;,&#39;Capabity&#39;)]==&#39;Total &#39;]
total_row_index = row.index.values[0]
#获取对应excel的行号(dataframe把两层表头当做索引，从数据行开始计数，而且从0开始计数。excel从表头就开始计数，而且从1开始计数)
excel_total_row_idx = int(total_row_index+2)
#获取excel最后一行的索引
excel_last_row_idx = ds_worksheet.used_range.rows.count
#保留按日期计算的各列的formula
I_col_formula = ds_worksheet.range(f&#39;I3:I{excel_total_row_idx}&#39;).formula
N_col_formula = ds_worksheet.range(f&#39;N3:N{excel_total_row_idx}&#39;).formula
T_col_formula = ds_worksheet.range(f&#39;T3:T{excel_total_row_idx}&#39;).formula
U_col_formula = ds_worksheet.range(f&#39;U3:U{excel_total_row_idx}&#39;).formula
Z_col_formula = ds_worksheet.range(f&#39;Z3:Z{excel_total_row_idx}&#39;).formula
AE_col_formula = ds_worksheet.range(f&#39;AE3:AE{excel_total_row_idx}&#39;).formula
AK_col_formula = ds_worksheet.range(f&#39;AK3:AK{excel_total_row_idx}&#39;).formula
AL_col_formula = ds_worksheet.range(f&#39;AL3:AL{excel_total_row_idx}&#39;).formula
#保留Total行开始一直到末尾所有行的formula
total_to_last_formula = ds_worksheet.range(f&#39;A{excel_total_row_idx+1}:AL{excel_last_row_idx}&#39;).formula
 
#中间计算过程省略...
 
#保存结果到excel                 
#直接把ds_df完整赋值给excel，会导致excel原有的公式被值覆盖
ds_worksheet.range("A1").expand().options(index=False).value = ds_df 
#用之前保留的formulas，重置公式
ds_worksheet.range(f&#39;I3:I{excel_total_row_idx}&#39;).formula = I_col_formula
ds_worksheet.range(f&#39;N3:N{excel_total_row_idx}&#39;).formula = N_col_formula
ds_worksheet.range(f&#39;T3:T{excel_total_row_idx}&#39;).formula = T_col_formula
ds_worksheet.range(f&#39;U3:U{excel_total_row_idx}&#39;).formula = U_col_formula
ds_worksheet.range(f&#39;Z3:Z{excel_total_row_idx}&#39;).formula = Z_col_formula
ds_worksheet.range(f&#39;AE3:AE{excel_total_row_idx}&#39;).formula = AE_col_formula
ds_worksheet.range(f&#39;AK3:AK{excel_total_row_idx}&#39;).formula = AK_col_formula
ds_worksheet.range(f&#39;AL3:AL{excel_total_row_idx}&#39;).formula = AL_col_formula
ds_worksheet.range(f&#39;A{excel_total_row_idx+1}:AL{excel_last_row_idx}&#39;).formula = total_to_last_formula
 
ds_format_workbook.save()
ds_format_workbook.close()
app.quit()

After testing, the above code solved it perfectly It meets my needs and the performance is absolutely fine.

The above is the detailed content of How does Python process Excel files?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete

Python and Time: Making the Most of Your Study TimeApr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Games, GUIs, and MoreApr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python vs. C : Applications and Use Cases ComparedApr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

The 2-Hour Python Plan: A Realistic ApproachApr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python: Exploring Its Primary ApplicationsApr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

How Much Python Can You Learn in 2 Hours?Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

How to teach computer novice programming basics in project and problem-driven methods within 10 hours?Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

Zend Studio 13.0.1

Powerful PHP integrated development environment

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

Hot Topics

Where is the login entrance for gmail email?

7517

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers