Home >Backend Development >Python Tutorial >How Can I Efficiently Remove Duplicate Rows Across Specific Columns in Pandas?

How Can I Efficiently Remove Duplicate Rows Across Specific Columns in Pandas?

DDD
DDDOriginal
2024-12-12 19:39:16315browse

How Can I Efficiently Remove Duplicate Rows Across Specific Columns in Pandas?

Dropping Duplicate Rows across Multiple Columns in Python Pandas

The pandas drop_duplicates function eliminates duplicated rows from a DataFrame, an invaluable tool for data cleansing. To extend this functionality, one can specify the columns to check for uniqueness.

For instance, consider the following DataFrame:

    A   B   C
0   foo 0   A
1   foo 1   A
2   foo 1   B
3   bar 1   A

Suppose you want to remove rows that have identical values in columns 'A' and 'C.' In this case, rows 0 and 1 would be eliminated.

Previously, this task required manual filtering or complex operations. However, with pandas' enhanced drop_duplicates function, it's now a breeze. The introduction of the keep parameter allows you to control how duplicates are handled.

To drop rows that match on specific columns, use the subset parameter. By setting keep to False, you instruct pandas to eliminate all duplicate rows:

import pandas as pd
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})
df.drop_duplicates(subset=['A', 'C'], keep=False)

Output:

    A   B   C
2   foo 1   B
3   bar 1   A

As you can see, rows 0 and 1 are successfully removed, leaving only the rows that are unique based on the values in columns 'A' and 'C.'

The above is the detailed content of How Can I Efficiently Remove Duplicate Rows Across Specific Columns in Pandas?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn