Home >Backend Development >Python Tutorial >How to Retrieve Rows Based on Distinct Column Values in Pandas?
Retrieving Rows Based on Distinct Column Values
In data manipulation scenarios, it becomes essential to extract rows based on unique values within a particular column. This article will demonstrate how to achieve this using Pandas, a popular Python library for data manipulation and analysis.
Problem Statement
Consider a dataframe with two columns, COL1 and COL2. The task is to retrieve rows where the values in COL2 are unique. For instance, given the dataframe below:
COL1 | COL2 |
---|---|
a.com | 22 |
b.com | 45 |
c.com | 34 |
e.com | 45 |
f.com | 56 |
g.com | 22 |
h.com | 45 |
The desired output is to obtain the rows based on the unique values in COL2:
COL1 | COL2 |
---|---|
a.com | 22 |
b.com | 45 |
c.com | 34 |
f.com | 56 |
Solution: Using Pandas' drop_duplicates() Method
The Pandas library provides a convenient method called drop_duplicates() to accomplish this task. By specifying the column name in the argument, you can check for duplicates and remove or keep specific rows based on your requirements.
For example, to remove all duplicate rows based on COL2 values, use the following code:
<code class="python">import pandas as pd df = pd.DataFrame({'COL1': ['a.com', 'b.com', 'c.com', 'e.com', 'f.com', 'g.com', 'h.com'], 'COL2': [22, 45, 34, 45, 56, 22, 45]}) df = df.drop_duplicates('COL2') # Displaying the result print(df)</code>
This will output the dataframe with unique values in COL2:
COL1 | COL2 |
---|---|
a.com | 22 |
b.com | 45 |
c.com | 34 |
f.com | 56 |
Additionally, you can specify the keep parameter to control which duplicate rows to keep. By default, it keeps the first occurrence ('first'), but you can also keep the last ('last') or remove all duplicates ('False').
<code class="python"># Keep first occurrence df = df.drop_duplicates('COL2', keep='first') # Keep last occurrence df = df.drop_duplicates('COL2', keep='last') # Remove all duplicates df = df.drop_duplicates('COL2', keep=False)</code>
The above is the detailed content of How to Retrieve Rows Based on Distinct Column Values in Pandas?. For more information, please follow other related articles on the PHP Chinese website!