In this article, we will discuss how to get the total or sum of any DataFrame column in Pandas. Additionally, we will also understand how to store the total as a new row in the DataFrame.
Table of Content
Preparing DataSet
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 25, 5, 4), ('Riti', 30, 7, 7), ('Shanky', 23, 2, 2), ('Shreya', 24, 2, 0), ('Aadi', 33, 11, 5), ('Sim', 28, 4, 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Age', 'Experience', 'RelevantExperience'], index = ['A', 'B', 'C', 'D', 'E', 'F']) print(df)
Contents of the created dataframe are,
Name Age Experience RelevantExperience A Shubham 25 5 4 B Riti 30 7 7 C Shanky 23 2 2 D Shreya 24 2 0 E Aadi 33 11 5 F Sim 28 4 4
Now, we will make operations on this DataFrame.
Frequently Asked:
Get total of a DataFrame column in Pandas
To get the total of a pandas column, we can simply use the DataFrame.column.sum()
method. Let’s understand some of the key attributes of the function.
DataFrame.sum(axis=None, skipna=None, numeric_only=None, min_count=0, **kwargs)
- axis: 0 for index-wise sum and 1 for column-wise sum
- skipna: To skip NA values
- numeric_only: If True, it will consider only the numeric columns
- min_count : Minimum valid values to perform the operation, else it will return NaN
Let’s understand it by getting the total of the “Experience” column.
# get sum of Experience column print(df['Experience'].sum())
Output
31
As observed, we have the total of the “Experience” column.
Latest Python - Video Tutorial
Store the column total in the DataFrame
Now, let’s understand how to store this total as a new row in the DataFrame. Here, we are going to use the .loc
property of the DataFrame.
# store the total in DataFrame df.loc["Total", "Experience"] = df['Experience'].sum() print(df)
Output
Name Age Experience RelevantExperience A Shubham 25.0 5.0 4.0 B Riti 30.0 7.0 7.0 C Shanky 23.0 2.0 2.0 D Shreya 24.0 2.0 0.0 E Aadi 33.0 11.0 5.0 F Sim 28.0 4.0 4.0 Total NaN NaN 31.0 NaN
As observed, we have a new row “Total” which contains the total of the Experience column. We can alternatively use “at” property as well instead of loc as shown below.
# store the total in DataFrame df.at["Total", "Experience"] = df['Experience'].sum() print(df)
Output
Name Age Experience RelevantExperience A Shubham 25.0 5.0 4.0 B Riti 30.0 7.0 7.0 C Shanky 23.0 2.0 2.0 D Shreya 24.0 2.0 0.0 E Aadi 33.0 11.0 5.0 F Sim 28.0 4.0 4.0 Total NaN NaN 31.0 NaN
Store the total for all columns
Instead of storing the total for just one column, say, we need to store the total for all numeric columns. Here, we will again use the DataFrame.sum()
method, but instead of specifying a column, we will just use the numeric_only
attribute.
# store total for all columns df.loc['Total'] = df.sum(numeric_only=True) print(df)
Output
Name Age Experience RelevantExperience A Shubham 25.0 5.0 4.0 B Riti 30.0 7.0 7.0 C Shanky 23.0 2.0 2.0 D Shreya 24.0 2.0 0.0 E Aadi 33.0 11.0 5.0 F Sim 28.0 4.0 4.0 Total NaN 163.0 31.0 22.0
As observed, we have the totals for all the columns stored in a new row.
Summary
In this article, we have discussed how to get the total of Pandas columns.
Latest Video Tutorials