Recent projects have been studying how to deal with missing values. Because the data used for analysis is diverse, missing values also account for a small part. There are two headaches:
1. There are A mice package that specializes in handling missing values. Is there anything similar in the all-purpose Python?
2. How to fill in missing values containing string types? Clustering and regression are all relative to numerical types, so what are good algorithms or good encapsulation packages for character types?
Please ask God for answers.
PS: Well, as for the example, it’s hard to describe, so it’s as follows:
name,password,age,address
Zhang San,123456,15.3,sichuang
李思,12,12.2, wuhan
王五,232,12,
钱六,,23,nanchang
haha,123456,,lal
拉拉,123123,,mmm
We hope that like the mice package in R language, we can use Python to quickly fill in the missing values (of course the information in this example is not very relevant, but there are more correlations in the data to be processed), and then As in the example, filling in the address belonging to the string type through other attributes is the second problem.
三叔2017-06-22 11:53:31
#文本保存到1.txt,删除最后一行的address值
name,password,age,address
张三,123456,15.3,sichuang
李四,12,12.2,wuhan
王五,232,12,
钱六,,23,nanchang
哈哈,123456,,lal
啦啦,123123,,
import pandas as pd
df = pd.read_table('1.txt', header=0, sep=',')
#添加一列
df['new'] = '新值'
#按当前行的其他列,填充address值
df['address'] = df.apply(lambda x: x['new'] if pd.isnull(x['address']) else x['address'], axis=1)
print df
仅有的幸福2017-06-22 11:53:31
PyMICE is a Python® library for mice behavioral data analysis. Can you see if it is what you want?
https://neuroinflab.wordpress...
http://neuroinflab.github.io/...