Pandas: Get sum of column values in a Dataframe

In this article we will discuss how to get the sum column values in a pandas dataframe. We will cover the following topics in detail,

  • Get the sum of all column values in a dataframe
    • Select the column by name and get the sum of all values in that column
    • Select the column by position and get the sum of all values in that column
  • Get the sum of columns values for selected rows only in Dataframe
  • Get the sum of column values in a dataframe based on condition

First of all, we will create a dataframe from list of tuples,

import pandas as pd
import numpy as np

# List of Tuples
students = [('jack',    34,     'Sydney',  155),
            ('Riti',    31,     'Delhi',   177.5),
            ('Aadi',    16,     'Mumbai',  81),
            ('Mohit',   31,     'Delhi',   np.NaN),
            ('Veena',   np.NaN, 'Delhi',   144),
            ('Shaunak', 35,     'Mumbai',  135),
            ('Shaun',   35,     'Colombo', 111) ]

# Create a DataFrame object
df = pd.DataFrame(students,
                  columns=['Name', 'Age', 'City', 'Score'])

print(df)

Output:

      Name   Age     City  Score
0     jack  34.0   Sydney  155.0
1     Riti  31.0    Delhi  177.5
2     Aadi  16.0   Mumbai   81.0
3    Mohit  31.0    Delhi    NaN
4    Veena   NaN    Delhi  144.0
5  Shaunak  35.0   Mumbai  135.0
6    Shaun  35.0  Colombo  111.0

This dataframe contains information about students like their name, age, city and score.

Now let’s see how to get the sum of values in the column ‘Score’ of this dataframe.

Get the sum of column values in a dataframe

Select the column by name and get the sum of all values in that column

Select a column from a dataframe by the column name and the get the sum of values in that column using the sum() function,

# Get total all values in column 'Score' of the DataFrame
total = df['Score'].sum()

print(total)

Output:

803.5

Here we selected the column ‘Score’ from the dataframe using [] operator and got all the values as Pandas Series object. Then we called the sum() function on that Series object to get the sum of values in it. So, it gave us the sum of values in the column ‘Score’ of the dataframe.

We can also select the column using loc[] and then we can get the sum of values in that column. For examples,

# Select column 'Score' using loc[] and calculate sum of all
# values in that column
total = df.loc[:, 'Score'].sum()

print(total)

Output:

803.5

Here we selected the column ‘Score’ as Series object using loc[] and then we called the sum() function on the Series object to get the sum of all values in the column ‘Score’ of the dataframe.

Know more about: Selecting columns by name from the dataframe using the loc[]

Select the column by position and get the sum of all values in that column

Suppose we don’t have the column name but we know the position of a column in dataframe and we want the sum of values in that column. For that we will select the column by number or position in the dataframe using iloc[] and it will return us the column contents as a Series object. Then we will call the sum() function on that series,

# Get sum of all values in 4th column
column_number = 4
total = df.iloc[:, column_number-1:column_number].sum()

print(total)

Output:

Score    803.5
dtype: float64

It returned a Series with single value.

Here we selected the 4th column from the dataframe as a Series object using the iloc[] and the called the sum() function on the series object. So, it returned the sum of values in the 4th column i.e. column ‘Score’.

Know more about: Selecting columns by the number from dataframe using the iloc[]

Get the sum of columns values for selected rows only in Dataframe

Select a column from Dataframe and get the sum of specific entries in that column. For example,

# Select 4th column of dataframe and get sum of first 3 values in that column
total = df.iloc[0:3, 3:4].sum()

print(total)

Output:

Score    413.5
dtype: float64

It returned a Series with single value.

Here we selected the first 3 rows of the 3rd column of the dataframe and then calculated its sum.

Get the sum of column values in a dataframe based on condition

Suppose in the above dataframe we want to get the sum of the score of students from Delhi only. For that we need to select only those values from the column ‘Score’ where ‘City’ is Delhi. Let’s see how to do that,

# Get sum of values in a column 'Score'
# for those rows only where 'City' is 'Delhi'
total = df.loc[df['City'] == 'Delhi', 'Score'].sum()

print(total)

Output:

321.5

Using loc[] we selected the column ‘Score’ but for only those rows where column ‘City’ has value ‘Delhi’. Then we called the sum() function on the series object to get the sum of scores of students from ‘Delhi’. So, basically we selected rows from a dataframe that satisfy our condition and then selected the values of column ‘Score’ for those rows only. We did that in a single expression using loc[].

Know more about:  loc[] & iloc[]

Conclusion:

These were the different ways to get the sum of all or specific values in a dataframe column in Pandas.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top