Home >Backend Development >Python Tutorial >How to Retrieve Rows Based on Distinct Column Values in Pandas?

How to Retrieve Rows Based on Distinct Column Values in Pandas?

Barbara Streisand
Barbara StreisandOriginal
2024-11-04 04:43:01993browse

How to Retrieve Rows Based on Distinct Column Values in Pandas?

Retrieving Rows Based on Distinct Column Values

In data manipulation scenarios, it becomes essential to extract rows based on unique values within a particular column. This article will demonstrate how to achieve this using Pandas, a popular Python library for data manipulation and analysis.

Problem Statement

Consider a dataframe with two columns, COL1 and COL2. The task is to retrieve rows where the values in COL2 are unique. For instance, given the dataframe below:

COL1 COL2
a.com 22
b.com 45
c.com 34
e.com 45
f.com 56
g.com 22
h.com 45

The desired output is to obtain the rows based on the unique values in COL2:

COL1 COL2
a.com 22
b.com 45
c.com 34
f.com 56

Solution: Using Pandas' drop_duplicates() Method

The Pandas library provides a convenient method called drop_duplicates() to accomplish this task. By specifying the column name in the argument, you can check for duplicates and remove or keep specific rows based on your requirements.

For example, to remove all duplicate rows based on COL2 values, use the following code:

<code class="python">import pandas as pd

df = pd.DataFrame({'COL1': ['a.com', 'b.com', 'c.com', 'e.com', 'f.com', 'g.com', 'h.com'],
                   'COL2': [22, 45, 34, 45, 56, 22, 45]})

df = df.drop_duplicates('COL2')

# Displaying the result
print(df)</code>

This will output the dataframe with unique values in COL2:

COL1 COL2
a.com 22
b.com 45
c.com 34
f.com 56

Additionally, you can specify the keep parameter to control which duplicate rows to keep. By default, it keeps the first occurrence ('first'), but you can also keep the last ('last') or remove all duplicates ('False').

<code class="python"># Keep first occurrence
df = df.drop_duplicates('COL2', keep='first')

# Keep last occurrence
df = df.drop_duplicates('COL2', keep='last')

# Remove all duplicates
df = df.drop_duplicates('COL2', keep=False)</code>

The above is the detailed content of How to Retrieve Rows Based on Distinct Column Values in Pandas?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn