In this article, we will discuss how to count unique values per group in a Pandas DataFrame. We are going to cover multiple approaches to achieve this as listed below.
Table of Contents
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience'], index=[0, 1, 2, 3, 4, 5]) print(df)
Contents of the created dataframe are,
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4
Here, let’s try to calculate the unique designations present in each Team using different approaches.
Count unique values per group Using nunique() method
Let’s start with the simplest method to count the unique values per group in a pandas DataFrame. As the name “nunique” suggests, it counts the unique values, so we are going to use it with groupby() function.
# unique designations by Team result = df.groupby("Team")["Designation"].nunique() print(result) print(type(result))
Output
Frequently Asked:
Team Design 1 PMO 1 Tech 3 Name: Designation, dtype: int64 <class 'pandas.core.series.Series'>
Here, the output shows the number of unique designations by each Team value. For example, “Tech” contains 3 unique designations, whereas, “Design” and “PMO” only contain 1 unique designation.
In case we need to sort the Series based on the values, simply use sort_values() method at the end.
Count unique values per group using the aggregate function
Let’s try to achieve the same thing using the aggregation function. It is generally used whenever we want to apply one or multiple functions on one or multiple columns. Basically, it provides flexibility to apply custom functions in minimal lines of code. Let’s quickly look at the code below.
# using the agg function result = df.groupby("Team").agg({"Designation":pd.Series.nunique}) print(result) print(type(result))
Output
Designation Team Design 1 PMO 1 Tech 3 <class 'pandas.core.frame.DataFrame'>
Here, we are again using the nunique function only, but the aggregation function is very useful whenever we want to do multiple aggregations together.
Count unique values per group using drop_duplicates() method
Another method is to drop the duplicates rows, which would create a unique combination set of values. And post that, we can directly count the values without any groupby function. Let’s understand from the example below.
# drop duplicates based on Team and Designation columns result = df[['Team', 'Designation']].drop_duplicates() print(result) print(type(result))
Output
Team Designation 0 Tech Data Scientist 1 Tech Data Engineer 2 PMO Program Manager 3 Design Graphic Designer 4 Tech Backend Developer <class 'pandas.core.frame.DataFrame'>
As observed, it has dropped the last row containing the Team as “Tech” and Designation as “Data Engineer” because that combination was already covered above. Now, since we already have a unique set of combinations for Team and Designation, we can simply use “value_counts” on the Team column to get the total unique values by each group.
# get the unique values for each Team print (result["Team"].value_counts())
Output
Tech 3 PMO 1 Design 1 Name: Team, dtype: int64
Here you do, we have a similar output from this method as well.
The complete example is as follows,
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience'], index=[0, 1, 2, 3, 4, 5]) print(df) print('*** Using nunique() method ***') # unique designations by Team result = df.groupby("Team")["Designation"].nunique() print(result) print(type(result)) print('*** Using the aggregate function ***') # using the agg function result = df.groupby("Team").agg({"Designation":pd.Series.nunique}) print(result) print(type(result)) print('*** using drop_duplicates() method ***') # drop duplicates based on Team and Designation columns result = df[['Team', 'Designation']].drop_duplicates() print(result) print(type(result)) # get the unique values for each Team print (result["Team"].value_counts())
Output:
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4 *** Using nunique() method *** Team Design 1 PMO 1 Tech 3 Name: Designation, dtype: int64 <class 'pandas.core.series.Series'> *** Using the aggregate function *** Designation Team Design 1 PMO 1 Tech 3 <class 'pandas.core.frame.DataFrame'> *** using drop_duplicates() method *** Team Designation 0 Tech Data Scientist 1 Tech Data Engineer 2 PMO Program Manager 3 Design Graphic Designer 4 Tech Backend Developer <class 'pandas.core.frame.DataFrame'> Tech 3 PMO 1 Design 1 Name: Team, dtype: int64
Summary
Great, you made it! In this article, we have discussed multiple ways to count the unique values per group in a pandas DataFrame. Thanks.