Home  >  Article  >  Backend Development  >  Python reads csv file, removes a column and then writes a new file technical tutorial

Python reads csv file, removes a column and then writes a new file technical tutorial

小云云
小云云Original
2017-12-30 13:23:552330browse

This article mainly shares with you an example of reading a csv file in Python and then writing a new file after removing a column. It has great reference value and I hope it will be helpful to everyone. Let's follow the editor to take a look. I hope it can help everyone better master Python

. Two methods are used to solve this problem, both of which are existing solutions on the Internet.

Scenario description:

There is a data file saved in text mode. There are now three columns of user_id, plan_id, and mobile_id. The goal is to get new files with only mobile_id, plan_id.

Solution

Option 1: Use python to open and write files Simply play through the data, process the data in the for loop and write it to a new file.

The code is as follows:


def readwrite1( input_file,output_file):
 f = open(input_file, 'r')
 out = open(output_file,'w')
 print (f)
 for line in f.readlines():
 a = line.split(",")
 x=a[0] + "," + a[1]+"\n"
 out.writelines(x)
 f.close()
 out.close()

Option 2: Read data with pandas Go to the DataFrame and then split the data, and directly use the write function of the DataFrame to write to the new file

The code is as follows:


def readwrite2(input_file,output_file): date_1=pd.read_csv(input_file,header=0,sep=',') date_1[['mobile', 'plan_id']].to_csv(output_file, sep=',', header=True,index=False)

From Looking at the code, pandas logic is clearer.

Let’s take a look at the execution efficiency!


def getRunTimes( fun ,input_file,output_file):
 begin_time=int(round(time.time() * 1000))
 fun(input_file,output_file)
 end_time=int(round(time.time() * 1000))
 print("读写运行时间:",(end_time-begin_time),"ms")

getRunTimes(readwrite1,input_file,output_file) #直接撸数据
getRunTimes(readwrite2,input_file,output_file1) #使用dataframe读写数据

Read and write running time: 976 ms

Read and write running time: 777 ms

input_file is about 270,000 For data, the efficiency of dataframe is still faster than that of for loop. If the amount of data is larger, will the effect be more obvious?

Next, try increasing the number of input_file records. The results are as follows

##55W19891509110W43123158
input_file ​ readwrite1 ​ readwrite2
27W 976 777
Test results from above It seems that the efficiency of dataframe is improved by about 30%.

Related recommendations:


Using python to filter and delete files in a directory Detailed examples

A brief introduction to Python NLP

Examples to explain python user management system

The above is the detailed content of Python reads csv file, removes a column and then writes a new file technical tutorial. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn