Home > Article > Backend Development > How to Extract Rows with Distinct Values in a Pandas DataFrame?
Distinct Values Row Retrieval
To extract rows based on distinct values within a column, specifically COL2, the following methods can be employed:
drop_duplicates with Keep First:
df = df.drop_duplicates('COL2', keep='first')
This retains the first occurrence of each unique value in COL2.
drop_duplicates with Keep Last:
df = df.drop_duplicates('COL2', keep='last')
This maintains the last occurrence of each unique value in COL2.
drop_duplicates with No Keep:
df = df.drop_duplicates('COL2', keep=False)
This removes all duplicate rows, resulting in only unique values in COL2.
Example:
Consider the following dataframe:
COL1 | COL2 |
---|---|
a.com | 22 |
b.com | 45 |
c.com | 34 |
e.com | 45 |
f.com | 56 |
g.com | 22 |
h.com | 45 |
Using the keep_first method produces:
COL1 | COL2 |
---|---|
a.com | 22 |
b.com | 45 |
c.com | 34 |
f.com | 56 |
The keep_last method yields:
COL1 | COL2 |
---|---|
c.com | 34 |
f.com | 56 |
g.com | 22 |
h.com | 45 |
Lastly, using the keep_false method produces:
COL1 | COL2 |
---|---|
c.com | 34 |
f.com | 56 |
The above is the detailed content of How to Extract Rows with Distinct Values in a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!