Pandas: Sum rows in Dataframe ( all or certain rows)
In this article we will discuss how to sum up rows in a dataframe and add the values as a new row in the same dataframe.
First of all, we will create a Dataframe,
import pandas as pd import numpy as np # List of Tuples employees_salary = [('Jack', 2000, 2010, 2050, 2134, 2111), ('Riti', 3000, 3022, 3456, 3111, 2109), ('Aadi', np.NaN, 2334, 2077, 2134, 3122), ('Mohit', 3012, 3050, 2010, 2122, 1111), ('Veena', 2023, 2232, 3050, 2112, 1099), ('Shaun', 2123, 2510, np.NaN, 3134, 2122), ('Mark', 4000, 2000, 2050, 2122, 2111)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees_salary, columns=['Name', 'Jan', 'Feb', 'March', 'April', 'May']) # Set column Name as the index of dataframe df.set_index('Name', inplace=True) print(df)
Jan Feb March April May Name Jack 2000.0 2010 2050.0 2134 2111 Riti 3000.0 3022 3456.0 3111 2109 Aadi NaN 2334 2077.0 2134 3122 Mohit 3012.0 3050 2010.0 2122 1111 Veena 2023.0 2232 3050.0 2112 1099 Shaun 2123.0 2510 NaN 3134 2122 Mark 4000.0 2000 2050.0 2122 2111
This Dataframe contains the salary of employees from month Jan to May. We made the column Name as the index of the dataframe. Each row of this dataframe contains the salary of an employee from Jan to May.
Get the sum of all rows in a Pandas Dataframe
Suppose in the above dataframe we want to get the information about the total salary paid in each month. Basically, we want a Series containing the sum of rows along with the columns i.e. each item in the Series should contain the sum of values of a column. Let’s see how to get that series,
# Get sum of all rows in the Dataframe as a Series total = df.sum() print('Total salary paid in each month:') print(total)
Total salary paid in each month: Jan 16158.0 Feb 17158.0 March 14693.0 April 16869.0 May 13785.0 dtype: float64
We called the sum() function on the dataframe without any parameter. So, by default it considered the axis as 0 and added all the rows column wise i.e. added all the values in each column and returned a Series object containing those values. Each item in this Series object contains the total salary paid to in a month and the month name is in the index label for that entry.
We can add this Series as a new row in the dataframe i.e.
# Get sum of all rows as a new row in Dataframe total = df.sum() total.name = 'Total' # Assign sum of all rows of DataFrame as a new Row df = df.append(total.transpose()) print(df)
Jan Feb March April May Name Jack 2000.0 2010.0 2050.0 2134.0 2111.0 Riti 3000.0 3022.0 3456.0 3111.0 2109.0 Aadi NaN 2334.0 2077.0 2134.0 3122.0 Mohit 3012.0 3050.0 2010.0 2122.0 1111.0 Veena 2023.0 2232.0 3050.0 2112.0 1099.0 Shaun 2123.0 2510.0 NaN 3134.0 2122.0 Mark 4000.0 2000.0 2050.0 2122.0 2111.0 Total 16158.0 17158.0 14693.0 16869.0 13785.0
It added a new row to the dataframe with the index label ‘Total’. Each entry in this row contains the information total salary paid in a month.
How did it work?
We transposed the Series to create a Dataframe with a single row. All the indexes in the Series became the columns in the new dataframe. Then we added this new dataframe to the original dataframe. It gave an effect that we have added a new row in the dataframe.
Get Sum of certain rows in Dataframe by row numbers
In the previous example we added all the rows of the dataframe but what if we want to get a sum of a few lines of the dataframe only? Like for the above dataframe we want the sum of values in the top 3 rows i.e. get the total salary paid by the month to 3 employees only from the top,
# Get sum of values of top 3 DataFrame rows, total = df.iloc[0:3].sum() print(total)
Jan 5000.0 Feb 7366.0 March 7583.0 April 7379.0 May 7342.0 dtype: float64
We selected the first 3 rows of the dataframe and called the sum() on that. It returned a Series containing total salary paid by the month for those selected employees only i.e. for the first 3 rows of the original dataframe.
Get the sum of specific rows in Pandas Dataframe by index/row label
Unlike the previous example, we can select specific rows by index label and then get a sump of values in those selected rows only i.e.
# Get sum of 3 DataFrame rows (selected by index labels) total = df.loc[['Aadi', 'Mohit', 'Mark']].sum() print(total)
Jan 7012.0 Feb 7384.0 March 6137.0 April 6378.0 May 6344.0 dtype: float64
We selected the 3 rows of the dataframe by index label i.e. ‘Aadi’, ‘Mohit’ and ‘Mark’. Then we added the values of the rows for these selected employees only. It returned a Series containing total salary paid by the month to those selected employees only month wise.
This is how we can sum up the values of multiple rows in a dataframe.