How to use Pandas groupby() with sum()?

The combination of pandas groupby() and sum() function comes in handy multiple times in data science to summarize any statistics or feature engineering. In this article, we will look at how to use these functions together.

Table of Contents

Preparing DataSet

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd

# List of Tuples
employees = [('Shubham', 24, 'Tech',  5),
             ('Riti', 45, 'Tech' ,  7),
             ('Shanky', 12, 'PMO' ,  2),
             ('Shreya', 8, 'Design' ,  2),
             ('Aadi', 15, 'Tech', 11),
             ('Sim', 18, 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(  employees,
                    columns=['Name', 'Salary', 'Team', 'Experience'])

print(df)

Contents of the created dataframe are,

      Name  Salary    Team  Experience
0  Shubham      24    Tech           5
1     Riti      45    Tech           7
2   Shanky      12     PMO           2
3   Shreya       8  Design           2
4     Aadi      15    Tech          11
5      Sim      18    Tech           4

Using pandas groupby and sum for all columns

Both these functions are very straightforward, so let’s implement them directly on the above DataFrame. Consider that we need to groupby on the “Team” column and get the sum for the rest of the columns.

Advertisements
# group by team and sum all the columns
print (df.groupby('Team').sum())

Output

        Salary  Experience
Team                      
Design       8           2
PMO         12           2
Tech       102          27

As observed, we got the sum of all the numerical columns (“Salary” and “Experience”) by the “Team” column.

Using pandas groupby and sum for a single column

In cases, where we want to get the sum of any specific column instead of all the columns, we can tweak our code as below.

# group by team and get the sum of Experience column
print (df.groupby('Team')['Experience'].sum())

Output

Team
Design     2
PMO        2
Tech      27
Name: Experience, dtype: int64

We can convert it to pandas DataFrame by performing reset_index() on the above output.

Sorting the output from pandas groupby and sum

In cases where we want the output to be sorted by the grouped column, we will use the sort attribute in groupby.

# sort the output by Team in descending order
print (df.groupby('Team', sort=False).agg({'Experience':'sum'}))

Output

        Experience
Team              
Tech            27
PMO              2
Design           2

Here you go, we have the output sorted by the grouped column in descending order. We can also sort it by the values by using the sort_values function.

Summary

In this article, we have discussed how to use the pandas groupby with the sum function.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top