Home  >  Article  >  Backend Development  >  Detailed explanation of read_excel in Python 2.7 pandas

Detailed explanation of read_excel in Python 2.7 pandas

不言
不言Original
2018-05-04 14:19:353630browse

This article mainly introduces the detailed explanation of read_excel in Python 2.7 pandas, which has certain reference value. Now I share it with you. Friends in need can refer to it

Import pandas module:

import pandas as pd

Use import to read the pandas module, and use its abbreviation pd for convenience.

Read the excel file to be processed:

df = pd.read_excel('log.xls')

Read by using the read_excel function Enter the excel file, which needs to be replaced with the path where the excel file is located. After reading, it becomes a pandas DataFrame object. DataFrame is a column-oriented two-dimensional table structure and contains lists and row labels. Operations on excel files are converted into operations on DataFrame. In addition, if an excel contains multiple tables, if you only want to read one of them:

df = pd.read_excel('log.xls', sheetname=1)

Added a parameter sheetname, indicating which number table, counting from 0. What I set above is 1, which is the second table.

After reading, you can first check the header information and the data type of each column:

df.dtypes

The output is as follows:

Member   object
Unnamed: 1 float64
Unnamed: 2 float64
Unnamed: 3 float64
Unnamed: 4 float64
Unnamed: 5 float64
家内外活动类型  object
Unnamed: 7  object
activity  object
dtype: object

Extract the last row of data that appears continuously for each member:

new_df = df.drop_duplicates(subset='Member', keep='last')

The above statement means to remove redundant rows based on the Member field and retain the last row of data in the same row. This will get the data of the last row of each member, and return the filtered DataFrame.

Next, you need to save the processed results as an excel file:

out = pd.ExcelWriter('output.xls')
new_df.to_excel(out)
out.save()

output.xls is yours You can choose the file name to be saved; then save the contents of the DataFrame to the file, and finally save the file to the system disk.

Next, you will see a new file in the current directory, which can be opened and viewed directly using excel.

Pandas also provides a lot of APIs. You can search the API documentation and find the appropriate function to complete the task according to the specific task.

Attached: A complete example

#coding=utf-8
import pandas as pd
 
# 读入excel文件中的第2个表
df = pd.read_excel('log.xls', sheetname=1)
# 查看表的数据类型
print df.dtypes
# 查看Member列的数据
print df['Member']
 
'''
# 新建一列,每一行的值是Member列和activity列相同行值的和
for i in df.index:
 df['activity_2'][i] = df['Member'][i] + df['activity'][i]
'''
 
# 根据Member字段去除掉多余的行,并且保留相同行的最后一行数据
new_df = df.drop_duplicates(subset='Member', keep='last')
# 导出结果
out = pd.ExcelWriter('output.xls')
new_df.to_excel(out)
out.save()

The above is the detailed content of Detailed explanation of read_excel in Python 2.7 pandas. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn