Home  >  Article  >  Backend Development  >  Python deduplicates multi-attribute duplicate data

Python deduplicates multi-attribute duplicate data

不言
不言Original
2018-04-18 15:29:532374browse

The following is an example of Python deduplication of multi-attribute duplicate data. It has a good reference value and I hope it will be helpful to everyone. Let’s take a look together

Steps to deduplicate data in the pandas module in python:

1) Use the duplicated method in DataFrame Returns a Boolean Series to display whether there are duplicate rows in each row. No duplicate rows are displayed as FALSE, and duplicate rows are displayed as TRUE;

2) Use the drop_duplicates method in the DataFrame to return a removed DataFrame with repeated rows.

Note:

If no parameters are set in the duplicated method and drop_duplicates method, these two methods will judge all by default. If in These two methods add specified attribute names (or column names), for example: frame.drop_duplicates(['state']), then specify some columns (state columns) to determine duplicates.

Specific examples are as follows:

>>> import pandas as pd 
>>> data={'state':[1,1,2,2],'pop':['a','b','c','d']} 
>>> frame=pd.DataFrame(data) 
>>> frame 
 pop state 
0 a  1 
1 b  1 
2 c  2 
3 d  2 
>>> IsDuplicated=frame.duplicated() 
>>> print IsDuplicated 
0 False 
1 False 
2 False 
3 False 
dtype: bool 
>>> frame=frame.drop_duplicates(['state']) 
>>> frame 
 pop state 
0 a  1 
2 c  2 
>>> IsDuplicated=frame.duplicated(['state']) 
>>> print IsDuplicated 
0 False 
2 False 
dtype: bool 
>>>


##

The above is the detailed content of Python deduplicates multi-attribute duplicate data. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn