Count unique values per group in Pandas

In this article, we will discuss how to count unique values per group in a Pandas DataFrame. We are going to cover multiple approaches to achieve this as listed below.

Table of Contents

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

Contents of the created dataframe are,

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

Here, let’s try to calculate the unique designations present in each Team using different approaches.

Advertisements

Count unique values per group Using nunique() method

Let’s start with the simplest method to count the unique values per group in a pandas DataFrame. As the name “nunique” suggests, it counts the unique values, so we are going to use it with groupby() function.

# unique designations by Team
result = df.groupby("Team")["Designation"].nunique()

print(result)

print(type(result))

Output

Team
Design    1
PMO       1
Tech      3
Name: Designation, dtype: int64

<class 'pandas.core.series.Series'>

Here, the output shows the number of unique designations by each Team value. For example, “Tech” contains 3 unique designations, whereas, “Design” and “PMO” only contain 1 unique designation.

In case we need to sort the Series based on the values, simply use sort_values() method at the end.

Count unique values per group using the aggregate function

Let’s try to achieve the same thing using the aggregation function. It is generally used whenever we want to apply one or multiple functions on one or multiple columns. Basically, it provides flexibility to apply custom functions in minimal lines of code. Let’s quickly look at the code below.

# using the agg function
result = df.groupby("Team").agg({"Designation":pd.Series.nunique})

print(result)

print(type(result))

Output

        Designation
Team               
Design            1
PMO               1
Tech              3

<class 'pandas.core.frame.DataFrame'>

Here, we are again using the nunique function only, but the aggregation function is very useful whenever we want to do multiple aggregations together.

Count unique values per group using drop_duplicates() method

Another method is to drop the duplicates rows, which would create a unique combination set of values. And post that, we can directly count the values without any groupby function. Let’s understand from the example below.

# drop duplicates based on Team and Designation columns
result = df[['Team', 'Designation']].drop_duplicates()

print(result)

print(type(result))

Output

     Team        Designation
0    Tech     Data Scientist
1    Tech      Data Engineer
2     PMO    Program Manager
3  Design   Graphic Designer
4    Tech  Backend Developer

<class 'pandas.core.frame.DataFrame'>

As observed, it has dropped the last row containing the Team as “Tech” and Designation as “Data Engineer” because that combination was already covered above. Now, since we already have a unique set of combinations for Team and Designation, we can simply use “value_counts” on the Team column to get the total unique values by each group.

# get the unique values for each Team
print (result["Team"].value_counts())

Output

Tech      3
PMO       1
Design    1
Name: Team, dtype: int64

Here you do, we have a similar output from this method as well.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

print('*** Using nunique() method ***')

# unique designations by Team
result = df.groupby("Team")["Designation"].nunique()

print(result)

print(type(result))

print('*** Using the aggregate function ***')

# using the agg function
result = df.groupby("Team").agg({"Designation":pd.Series.nunique})

print(result)

print(type(result))

print('*** using drop_duplicates() method ***')

# drop duplicates based on Team and Designation columns
result = df[['Team', 'Designation']].drop_duplicates()

print(result)

print(type(result))

# get the unique values for each Team
print (result["Team"].value_counts())

Output:

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

*** Using nunique() method ***
Team
Design    1
PMO       1
Tech      3
Name: Designation, dtype: int64

<class 'pandas.core.series.Series'>

*** Using the aggregate function ***
        Designation
Team               
Design            1
PMO               1
Tech              3

<class 'pandas.core.frame.DataFrame'>

*** using drop_duplicates() method ***

     Team        Designation
0    Tech     Data Scientist
1    Tech      Data Engineer
2     PMO    Program Manager
3  Design   Graphic Designer
4    Tech  Backend Developer

<class 'pandas.core.frame.DataFrame'>

Tech      3
PMO       1
Design    1
Name: Team, dtype: int64

Summary

Great, you made it! In this article, we have discussed multiple ways to count the unique values per group in a pandas DataFrame. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top