GroupBy two columns and get count in Pandas DataFrame

In this article, we will discuss how to groupby two columns and get counts in Pandas DataFrame. We will cover different methods along with some checks to keep in mind.

Table of Contents

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

Contents of the created dataframe are,

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

Groupby two columns and get counts in Pandas using size() function

The size() function returns the total number of elements, when used in combined with GroupBy, it could give the counts per group. Let’s look at an example for better understanding.

Advertisements

Here, we need to groupby on the columns “Team” and “Designation” and get the counts for each combination.

# count of unique combinations
print (df.groupby(['Team', 'Designation']).size())

Output

Team    Designation      
Design  Graphic Designer     1
PMO     Program Manager      1
Tech    Backend Developer    1
        Data Engineer        2
        Data Scientist       1
dtype: int64

The output shows all the possible combinations of the “Team” and “Designation” column values and their counts. We can simply convert them into pandas DataFrame using “reset_index” and assign a column name as below.

# count of unique combinations
print (df.groupby(['Team', 'Designation']).size().reset_index(name = 'count'))

Output

     Team        Designation  count
0  Design   Graphic Designer      1
1     PMO    Program Manager      1
2    Tech  Backend Developer      1
3    Tech      Data Engineer      2
4    Tech     Data Scientist      1

Groupby two columns and get counts in Pandas using count() function

This method is also quite similar to the above one, here we just need to specify any column to count the values. Let’s try to count the values for each combination of the “Team” and “Designation” columns.

# count of unique combinations
print (df.groupby(['Team', 'Designation'])['Name'].count())

Output

Team    Designation      
Design  Graphic Designer     1
PMO     Program Manager      1
Tech    Backend Developer    1
        Data Engineer        2
        Data Scientist       1
Name: Name, dtype: int64

As observed, we have the same output as the above method. Note that we need to be a little cautious that the count() will skip any row containing missing values (NaN).

Groupby two columns and get counts in Pandas using aggregate function

The aggregate function i.e. agg(), is also used to perform any function on the pandas DataFrame. We can here use the count function within the aggregate function to perform the above task.

# count using agg function
print (df.groupby(['Team', 'Designation']).agg({'Name':'count'}))

Output

Team   Designation            
Design Graphic Designer      1
PMO    Program Manager       1
Tech   Backend Developer     1
       Data Engineer         2
       Data Scientist        1

Here you go, we have a similar output from this method as well.

Groupby two columns and get counts using value_counts() function

The final method is to directly use value_counts to get the count for each combination. For this method, we don’t even need the groupby function. Let’s look at the code below.

# value counts on each combination
print (df[['Team', 'Designation']].value_counts())

Output

Team    Designation      
Tech    Data Engineer        2
        Data Scientist       1
        Backend Developer    1
PMO     Program Manager      1
Design  Graphic Designer     1
dtype: int64

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

print('** Groupby two columns and get counts in Pandas using size() function **')

# count of unique combinations
print (df.groupby(['Team', 'Designation']).size())

# count of unique combinations
print (df.groupby(['Team', 'Designation']).size().reset_index(name = 'count'))

print('** Groupby two columns and get counts in Pandas using count() function **')

# count of unique combinations
print (df.groupby(['Team', 'Designation'])['Name'].count())

print('**Groupby two columns and get counts in Pandas using aggregate function**')

# count using agg function
print (df.groupby(['Team', 'Designation']).agg({'Name':'count'}))

print('**Groupby two columns and get counts using value_counts() function**')

# value counts on each combination
print (df[['Team', 'Designation']].value_counts())

Output:

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

** Groupby two columns and get counts in Pandas using size() function **

Team    Designation      
Design  Graphic Designer     1
PMO     Program Manager      1
Tech    Backend Developer    1
        Data Engineer        2
        Data Scientist       1
dtype: int64

     Team        Designation  count
0  Design   Graphic Designer      1
1     PMO    Program Manager      1
2    Tech  Backend Developer      1
3    Tech      Data Engineer      2
4    Tech     Data Scientist      1

** Groupby two columns and get counts in Pandas using count() function **

Team    Designation      
Design  Graphic Designer     1
PMO     Program Manager      1
Tech    Backend Developer    1
        Data Engineer        2
        Data Scientist       1
Name: Name, dtype: int64

**Groupby two columns and get counts in Pandas using aggregate function**

                          Name
Team   Designation            
Design Graphic Designer      1
PMO    Program Manager       1
Tech   Backend Developer     1
       Data Engineer         2
       Data Scientist        1

**Groupby two columns and get counts using value_counts() function**

Team    Designation      
Tech    Data Engineer        2
Design  Graphic Designer     1
PMO     Program Manager      1
Tech    Backend Developer    1
        Data Scientist       1
dtype: int64

Summary

In this article, we have discussed multiple ways to group two columns and get the counts in a pandas DataFrame. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top