In this article, we will discuss how to groupby two columns and get counts in Pandas DataFrame. We will cover different methods along with some checks to keep in mind.
Table of Contents
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees= [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience'], index=[0, 1, 2, 3, 4, 5]) print(df)
Contents of the created dataframe are,
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4
Groupby two columns and get counts in Pandas using size() function
The size() function returns the total number of elements, when used in combined with GroupBy, it could give the counts per group. Let’s look at an example for better understanding.
Here, we need to groupby on the columns “Team” and “Designation” and get the counts for each combination.
# count of unique combinations print (df.groupby(['Team', 'Designation']).size())
Output
Frequently Asked:
Team Designation Design Graphic Designer 1 PMO Program Manager 1 Tech Backend Developer 1 Data Engineer 2 Data Scientist 1 dtype: int64
The output shows all the possible combinations of the “Team” and “Designation” column values and their counts. We can simply convert them into pandas DataFrame using “reset_index” and assign a column name as below.
# count of unique combinations print (df.groupby(['Team', 'Designation']).size().reset_index(name = 'count'))
Output
Team Designation count 0 Design Graphic Designer 1 1 PMO Program Manager 1 2 Tech Backend Developer 1 3 Tech Data Engineer 2 4 Tech Data Scientist 1
Groupby two columns and get counts in Pandas using count() function
This method is also quite similar to the above one, here we just need to specify any column to count the values. Let’s try to count the values for each combination of the “Team” and “Designation” columns.
# count of unique combinations print (df.groupby(['Team', 'Designation'])['Name'].count())
Output
Team Designation Design Graphic Designer 1 PMO Program Manager 1 Tech Backend Developer 1 Data Engineer 2 Data Scientist 1 Name: Name, dtype: int64
As observed, we have the same output as the above method. Note that we need to be a little cautious that the count() will skip any row containing missing values (NaN).
Groupby two columns and get counts in Pandas using aggregate function
The aggregate function i.e. agg(), is also used to perform any function on the pandas DataFrame. We can here use the count function within the aggregate function to perform the above task.
# count using agg function print (df.groupby(['Team', 'Designation']).agg({'Name':'count'}))
Output
Team Designation Design Graphic Designer 1 PMO Program Manager 1 Tech Backend Developer 1 Data Engineer 2 Data Scientist 1
Here you go, we have a similar output from this method as well.
Groupby two columns and get counts using value_counts() function
The final method is to directly use value_counts to get the count for each combination. For this method, we don’t even need the groupby function. Let’s look at the code below.
# value counts on each combination print (df[['Team', 'Designation']].value_counts())
Output
Team Designation Tech Data Engineer 2 Data Scientist 1 Backend Developer 1 PMO Program Manager 1 Design Graphic Designer 1 dtype: int64
The complete example is as follows,
import pandas as pd import numpy as np # List of Tuples employees= [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience'], index=[0, 1, 2, 3, 4, 5]) print(df) print('** Groupby two columns and get counts in Pandas using size() function **') # count of unique combinations print (df.groupby(['Team', 'Designation']).size()) # count of unique combinations print (df.groupby(['Team', 'Designation']).size().reset_index(name = 'count')) print('** Groupby two columns and get counts in Pandas using count() function **') # count of unique combinations print (df.groupby(['Team', 'Designation'])['Name'].count()) print('**Groupby two columns and get counts in Pandas using aggregate function**') # count using agg function print (df.groupby(['Team', 'Designation']).agg({'Name':'count'})) print('**Groupby two columns and get counts using value_counts() function**') # value counts on each combination print (df[['Team', 'Designation']].value_counts())
Output:
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4 ** Groupby two columns and get counts in Pandas using size() function ** Team Designation Design Graphic Designer 1 PMO Program Manager 1 Tech Backend Developer 1 Data Engineer 2 Data Scientist 1 dtype: int64 Team Designation count 0 Design Graphic Designer 1 1 PMO Program Manager 1 2 Tech Backend Developer 1 3 Tech Data Engineer 2 4 Tech Data Scientist 1 ** Groupby two columns and get counts in Pandas using count() function ** Team Designation Design Graphic Designer 1 PMO Program Manager 1 Tech Backend Developer 1 Data Engineer 2 Data Scientist 1 Name: Name, dtype: int64 **Groupby two columns and get counts in Pandas using aggregate function** Name Team Designation Design Graphic Designer 1 PMO Program Manager 1 Tech Backend Developer 1 Data Engineer 2 Data Scientist 1 **Groupby two columns and get counts using value_counts() function** Team Designation Tech Data Engineer 2 Design Graphic Designer 1 PMO Program Manager 1 Tech Backend Developer 1 Data Scientist 1 dtype: int64
Summary
In this article, we have discussed multiple ways to group two columns and get the counts in a pandas DataFrame. Thanks.