Home >Backend Development >Python Tutorial >Detailed explanation of 16 Pandas functions to improve your 'data cleaning' ability by 100 times!
import pandas as pd df ={'姓名':[' 黄同学','黄至尊','黄老邪 ','陈大美','孙尚香'], '英文名':['Huang tong_xue','huang zhi_zun','Huang Lao_xie','Chen Da_mei','sun shang_xiang'], '性别':['男','women','men','女','男'], '身份证':['463895200003128433','429475199912122345','420934199110102311','431085200005230122','420953199509082345'], '身高':['mid:175_good','low:165_bad','low:159_bad','high:180_verygood','low:172_bad'], '家庭住址':['湖北广水','河南信阳','广西桂林','湖北孝感','广东广州'], '电话号码':['13434813546','19748672895','16728613064','14561586431','19384683910'], '收入':['1.1万','8.5千','0.9万','6.5千','2.0万']} df = pd.DataFrame(df) dfThe results are as follows:
df["姓名"].str.cat(df["家庭住址"],sep='-'*3)
df["家庭住址"].str.contains("广")
# 第一个行的“ 黄伟”是以空格开头的 df["姓名"].str.startswith("黄") df["英文名"].str.endswith("e")
df["电话号码"].str.count("3")
df["姓名"].str.get(-1) df["身高"].str.split(":") df["身高"].str.split(":").str.get(0)
df["性别"].str.len()
df["英文名"].str.upper() df["英文名"].str.lower()
df["家庭住址"].str.pad(10,fillchar="*") # 相当于ljust() df["家庭住址"].str.pad(10,side="right",fillchar="*") # 相当于rjust() df["家庭住址"].str.center(10,fillchar="*")
df["性别"].str.repeat(3)
df["电话号码"].str.slice_replace(4,8,"*"*4)
df["身高"].str.replace(":","-")
df["收入"].str.replace("\d+\.\d+","正则")
# 普通用法 df["身高"].str.split(":") # split方法,搭配expand参数 df[["身高描述","final身高"]] = df["身高"].str.split(":",expand=True) df # split方法搭配join方法 df["身高"].str.split(":").str.join("?"*5)
df["姓名"].str.len() df["姓名"] = df["姓名"].str.strip() df["姓名"].str.len()
df["身高"] df["身高"].str.findall("[a-zA-Z]+")
df["身高"].str.extract("([a-zA-Z]+)") # extractall提取得到复合索引 df["身高"].str.extractall("([a-zA-Z]+)") # extract搭配expand参数 df["身高"].str.extract("([a-zA-Z]+).*?([a-zA-Z]+)",expand=True)
The above is the detailed content of Detailed explanation of 16 Pandas functions to improve your 'data cleaning' ability by 100 times!. For more information, please follow other related articles on the PHP Chinese website!