The combination of pandas groupby() and sum() function comes in handy multiple times in data science to summarize any statistics or feature engineering. In this article, we will look at how to use these functions together.
Table of Contents
Preparing DataSet
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd # List of Tuples employees = [('Shubham', 24, 'Tech', 5), ('Riti', 45, 'Tech' , 7), ('Shanky', 12, 'PMO' , 2), ('Shreya', 8, 'Design' , 2), ('Aadi', 15, 'Tech', 11), ('Sim', 18, 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame( employees, columns=['Name', 'Salary', 'Team', 'Experience']) print(df)
Contents of the created dataframe are,
Name Salary Team Experience 0 Shubham 24 Tech 5 1 Riti 45 Tech 7 2 Shanky 12 PMO 2 3 Shreya 8 Design 2 4 Aadi 15 Tech 11 5 Sim 18 Tech 4
Using pandas groupby and sum for all columns
Both these functions are very straightforward, so let’s implement them directly on the above DataFrame. Consider that we need to groupby on the “Team” column and get the sum for the rest of the columns.
# group by team and sum all the columns print (df.groupby('Team').sum())
Output
Salary Experience Team Design 8 2 PMO 12 2 Tech 102 27
As observed, we got the sum of all the numerical columns (“Salary” and “Experience”) by the “Team” column.
Frequently Asked:
Using pandas groupby and sum for a single column
In cases, where we want to get the sum of any specific column instead of all the columns, we can tweak our code as below.
# group by team and get the sum of Experience column print (df.groupby('Team')['Experience'].sum())
Output
Team Design 2 PMO 2 Tech 27 Name: Experience, dtype: int64
We can convert it to pandas DataFrame by performing reset_index() on the above output.
Sorting the output from pandas groupby and sum
In cases where we want the output to be sorted by the grouped column, we will use the sort attribute in groupby.
# sort the output by Team in descending order print (df.groupby('Team', sort=False).agg({'Experience':'sum'}))
Output
Experience Team Tech 27 PMO 2 Design 2
Here you go, we have the output sorted by the grouped column in descending order. We can also sort it by the values by using the sort_values function.
Summary
In this article, we have discussed how to use the pandas groupby with the sum function.