Home  >  Q&A  >  body text

How to handle missing values ​​using Python

Recent projects have been studying how to deal with missing values. Because the data used for analysis is diverse, missing values ​​also account for a small part. There are two headaches:
1. There are A mice package that specializes in handling missing values. Is there anything similar in the all-purpose Python?
2. How to fill in missing values ​​containing string types? Clustering and regression are all relative to numerical types, so what are good algorithms or good encapsulation packages for character types?
Please ask God for answers.
PS: Well, as for the example, it’s hard to describe, so it’s as follows:
name,password,age,address
Zhang San,123456,15.3,sichuang
李思,12,12.2, wuhan
王五,232,12,
钱六,,23,nanchang
haha,123456,,lal
拉拉,123123,,mmm

We hope that like the mice package in R language, we can use Python to quickly fill in the missing values ​​​​(of course the information in this example is not very relevant, but there are more correlations in the data to be processed), and then As in the example, filling in the address belonging to the string type through other attributes is the second problem.

PHP中文网PHP中文网2652 days ago826

reply all(2)I'll reply

  • 三叔

    三叔2017-06-22 11:53:31

    #文本保存到1.txt,删除最后一行的address值
    name,password,age,address
    张三,123456,15.3,sichuang
    李四,12,12.2,wuhan
    王五,232,12,
    钱六,,23,nanchang
    哈哈,123456,,lal
    啦啦,123123,,
    
    import pandas as pd
    df = pd.read_table('1.txt', header=0, sep=',')
    #添加一列
    df['new'] = '新值'
    #按当前行的其他列,填充address值
    df['address'] = df.apply(lambda x: x['new'] if pd.isnull(x['address']) else x['address'], axis=1)
    
    print df
    

    reply
    0
  • 仅有的幸福

    仅有的幸福2017-06-22 11:53:31

    PyMICE is a Python® library for mice behavioral data analysis. Can you see if it is what you want?
    https://neuroinflab.wordpress...
    http://neuroinflab.github.io/...

    reply
    0
  • Cancelreply