使用python的pandas模块时，查找和修改dataFrame中的值速度非常慢，请问是什么原因，有什么好办法解决吗？

Question

最近在用pandas做一个机器学习的项目，训练集大概2G。我用的dataFrame来操作数据，对训练集做了一次groupby和mean的操作，速度还挺快的，但把得到的结果赋值给用户参数（也是一个dataframe表）的时候，速度缺特别...

阿神 · Answer

pandas has a generator of df.iterrows() to loop through the rows of DataFrame, which is the most efficient.

For details, please see the documentation:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html

ringa_lee · Answer

I have never done anything of this magnitude, but my experience is that it is best not to operate df one by one, as it is basically slow, and the entire column operation is much faster
1. Append
It is best to write all the new values in an empty df , and then merge
But sometimes it is inevitable to append directly
2. It is faster to delete
directly using the del statement
3. Change
also adopts the merge idea, overwriting the original value

高洛峰 · Answer

I don’t think the assignment is slow
self.user_params.loc[user,'bias'] is equivalent to taking the second-level index from the first-level index, which should be very slow
Can the item and user be divided into two dataframes?

PHP中文网 · Answer

loc is the slowest. Try to use ix instead. It's best to use iterrows to construct a loop.

使用python的pandas模块时，查找和修改dataFrame中的值速度非常慢，请问是什么原因，有什么好办法解决吗？

reply all(4)I'll reply