Home >Backend Development >Python Tutorial >How to Count Distinct Values by Group in Pandas?

How to Count Distinct Values by Group in Pandas?

Linda Hamilton
Linda HamiltonOriginal
2024-10-23 13:23:01687browse

How to Count Distinct Values by Group in Pandas?

Pandas Equivalent: Distinct Count by Group

When using Pandas as a database alternative, it's often necessary to perform complex operations like distinct counting by group. In this case, we want to count the number of distinct clients per year month.

In SQL, this can be achieved using the count(distinct) aggregate function. However, Pandas provides a slightly different syntax for this operation.

To count the distinct clients per year month in Pandas, we can use the following code:

<code class="python">table.groupby('YEARMONTH').CLIENTCODE.nunique()</code>

The groupby() function splits the DataFrame into groups based on the specified column (YEARMONTH in this case). The nunique() function then counts the number of unique values within each group.

Here's an example to illustrate:

<code class="python">import pandas as pd

# Create a DataFrame with sample data
data = {
    'YEARMONTH': ['201301', '201301', '201301', '201302', '201302', '201302', '201302'],
    'CLIENTCODE': [1, 1, 2, 1, 2, 2, 3]
}
table = pd.DataFrame(data)

# Count distinct clients per year month
result = table.groupby('YEARMONTH').CLIENTCODE.nunique()

print(result)</code>

Output:

YEARMONTH
201301    2
201302    3

As you can see, the result matches the expected output from the SQL query.

The above is the detailed content of How to Count Distinct Values by Group in Pandas?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn