Pandas: Sum rows in Dataframe ( all or certain rows)

In this article we will discuss how to sum up rows in a dataframe and add the values as a new row in the same dataframe.

First of all, we will create a Dataframe,

import pandas as pd
import numpy as np

# List of Tuples
employees_salary = [('Jack', 2000, 2010, 2050, 2134, 2111),
                    ('Riti', 3000, 3022, 3456, 3111, 2109),
                    ('Aadi', np.NaN, 2334, 2077, 2134, 3122),
                    ('Mohit', 3012, 3050, 2010, 2122, 1111),
                    ('Veena', 2023, 2232, 3050, 2112, 1099),
                    ('Shaun', 2123, 2510, np.NaN, 3134, 2122),
                    ('Mark', 4000, 2000, 2050, 2122, 2111)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees_salary,
                  columns=['Name', 'Jan', 'Feb', 'March', 'April', 'May'])

# Set column Name as the index of dataframe
df.set_index('Name', inplace=True)

print(df)

Output:

          Jan   Feb   March  April   May
Name                                    
Jack   2000.0  2010  2050.0   2134  2111
Riti   3000.0  3022  3456.0   3111  2109
Aadi      NaN  2334  2077.0   2134  3122
Mohit  3012.0  3050  2010.0   2122  1111
Veena  2023.0  2232  3050.0   2112  1099
Shaun  2123.0  2510     NaN   3134  2122
Mark   4000.0  2000  2050.0   2122  2111

This Dataframe contains the salary of employees from month Jan to May. We made the column Name as the index of the dataframe. Each row of this dataframe contains the salary of an employee from Jan to May.

Get the sum of all rows in a Pandas Dataframe

Suppose in the above dataframe we want to get the information about the total salary paid in each month. Basically, we want a Series containing the sum of rows along with the columns i.e. each item in the Series should contain the sum of values of a column. Let’s see how to get that series,

# Get sum of all rows in the Dataframe as a Series
total = df.sum()

print('Total salary paid in each month:')
print(total)

Output:

Total salary paid in each month:
Jan      16158.0
Feb      17158.0
March    14693.0
April    16869.0
May      13785.0
dtype: float64

We called the sum() function on the dataframe without any parameter. So, by default it considered the axis as 0 and added all the rows column wise i.e. added all the values in each column and returned a Series object containing those values. Each item in this Series object contains the total salary paid to in a month and the month name is in the index label for that entry.

We can add this Series as a new row in the dataframe i.e.

# Get sum of all rows as a new row in Dataframe

total = df.sum()
total.name = 'Total'

# Assign sum of all rows of DataFrame as a new Row
df = df.append(total.transpose())

print(df)

Output:

           Jan      Feb    March    April      May
Name                                              
Jack    2000.0   2010.0   2050.0   2134.0   2111.0
Riti    3000.0   3022.0   3456.0   3111.0   2109.0
Aadi       NaN   2334.0   2077.0   2134.0   3122.0
Mohit   3012.0   3050.0   2010.0   2122.0   1111.0
Veena   2023.0   2232.0   3050.0   2112.0   1099.0
Shaun   2123.0   2510.0      NaN   3134.0   2122.0
Mark    4000.0   2000.0   2050.0   2122.0   2111.0
Total  16158.0  17158.0  14693.0  16869.0  13785.0

It added a new row to the dataframe with the index label ‘Total’. Each entry in this row contains the information total salary paid in a month.

How did it work?

We transposed the Series to create a Dataframe with a single row. All the indexes in the Series became the columns in the new dataframe. Then we added this new dataframe to the original dataframe. It gave an effect that we have added a new row in the dataframe.

Get Sum of certain rows in Dataframe by row numbers

In the previous example we added all the rows of the dataframe but what if we want to get a sum of a few lines of the dataframe only? Like for the above dataframe we want the sum of values in the top 3 rows i.e. get the total salary paid by the month to 3 employees only from the top,

# Get sum of values of top 3 DataFrame rows,
total = df.iloc[0:3].sum()

print(total)

Output:

Jan 5000.0
Feb 7366.0
March 7583.0
April 7379.0
May 7342.0
dtype: float64

We selected the first 3 rows of the dataframe and called the sum() on that. It returned a Series containing total salary paid by the month for those selected employees only i.e. for the first 3 rows of the original dataframe.

Get the sum of specific rows in Pandas Dataframe by index/row label

Unlike the previous example, we can select specific rows by index label and then get a sump of values in those selected rows only i.e.

# Get sum of 3 DataFrame rows (selected by index labels)
total = df.loc[['Aadi', 'Mohit', 'Mark']].sum()

print(total)

Output:

Jan 7012.0
Feb 7384.0
March 6137.0
April 6378.0
May 6344.0
dtype: float64

We selected the 3 rows of the dataframe by index label i.e. ‘Aadi’, ‘Mohit’ and ‘Mark’. Then we added the values of the rows for these selected employees only. It returned a Series containing total salary paid by the month to those selected employees only month wise.

Conclusion:

This is how we can sum up the values of multiple rows in a dataframe.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top