In this article we will discuss how to use the sum() function of Dataframe to sum the values in a Dataframe along a different axis. We will also discuss all the parameters of the sum() function in detail.

In Pandas, the Dataframe provides a member function sum(), that can be used to get the sum of values in a Dataframe along the requested axis i.e. the sum of values along with columns or along rows in the Dataframe.

Let’s know more about this function,

Syntax of Dataframe.sum()

DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)

Parameters:

  • axis: The axis along which the sum of values will be calculated.
    • 0: To get the sum of values along the index/rows
    • 1: To get the sum of values along the columns
  • skipna: bool, the default value is True.
    • If True then skip NaNs while calculating the sum.
  • level: int or level name. The default value is None
    • If the axis is Multi-Index, then add items in a given level only
  • numeric_only: bool. The default value is None
    • If True then include only int, float or Boolean.
  • min_count: int. The default value is 0
    • Add items only when non-NaN values are equal to or more than min_count.

Returns:

  • If no level information is provided or dataframe has only one index, then sum() function returns a series containing the sum of values along the given axis. Whereas, if dataframe is a Multi-Index dataframe and level information is provided then sum() function returns a Dataframe.

Let’s understand this with some examples,

Example 1: Pandas Dataframe.sum() without any parameter

Suppose we have a Dataframe,

import pandas as pd
import numpy as np

# List of Tuples
empSalary = [('jack', 2000, 2010, 2050, 2134, 2111),
             ('Riti', 3000, 3022, 3456, 3111, 2109),
             ('Aadi', 4022, np.NaN, 2077, 2134, 3122),
             ('Mohit', 3012, 3050, 2010, 2122, 1111),
             ('Veena', 2023, 2232, np.NaN, 2112, 1099),
             ('Shaun', 2123, 2510, 3050, 3134, 2122),
             ('Mark', 4000, 2000, 2050, 2122, 2111)
             ]

# Create a DataFrame object
emp_salary_df = pd.DataFrame(empSalary,
                             columns=['Name', 'Jan', 'Feb', 'March', 'April', 'May'])
emp_salary_df.set_index('Name', inplace=True)

print('Dataframe Contents:')
print(emp_salary_df)

If we call the sum() function on this Dataframe without any axis parameter, then by default axis value will be 0 and it returns a Series containing the sum of values along the index axis i.e. it will add the values in each column and returns a Series of these values,

# Get the sum of values along the default axis i.e. index/rows
result = emp_salary_df.sum()

print('Series containing sum of values in each column:')
print(result)

Output:

Series containing sum of values in each column:
Jan      20180.0
Feb      14824.0
March    14693.0
April    16869.0
May      13785.0
dtype: float64

As values were summed up along the index axis i.e. along the rows. So, it returned a Series object where each value in the series represents the sum of values in a column and its index contains the corresponding column Name.

Example 2: Dataframe.sum() with axis value 1

If we pass the axis value 1, then it returns a Series containing the sum of values along the column axis i.e. axis 1. It will add the values in each row and returns a Series of these values,

# Get the sum of values along the axis 1 i.e. columns
result = emp_salary_df.sum(axis=1)

print('Series containing sum of values in each row:')
print(result)

Output:

Series containing sum of values in each row:
Name
jack     10305.0
Riti     14698.0
Aadi     11355.0
Mohit    11305.0
Veena     7466.0
Shaun    12939.0
Mark     12283.0
dtype: float64

As values were summed up along the axis 1 i.e. along with the columns. It returned a Series object where each value in the series represents the sum of values in a row and its index contains the corresponding row Index Label of Dataframe.

Example 3: Dataframe.sum() without skipping NaN

The default value of skipna parameter is True, so if we call the sum() function without skipna parameter then it skips all the NaN values by default. But if you don’t want to skip NaNs then we can pass the skipna parameter as False i.e.

# Get a Sum of values along default axis (index/rows)
# in dataframe without skipping NaNs
result = emp_salary_df.sum(skipna=False)

print('Series containing sum of values in each column:')
print(result)

Output:

Series containing sum of values in each column:
Jan      20180.0
Feb          NaN
March        NaN
April    16869.0
May      13785.0
dtype: float64

It returned a Series containing sum of values in columns. But for any column if it contains the NaN then sum() returned total as NaN for that particular column. Like in above example ‘Feb’ & ‘March’ columns have NaN values and skipna is False, therefore the sum of values in these columns is NaN too.

Example 4: Dataframe.sum() with min_count

If min_count is provided then it will sum the values in a column or a row only if the minimum non-NaN values are equal or greater than the min_count value. For example,

# Get sum of values in columns if min number
# of Non-NaN values are 7
result = emp_salary_df.sum(min_count=7)

print('Series containing sum of values in each column:')
print(result)

Output:

Series containing sum of values in each column:
Jan      20180.0
Feb          NaN
March        NaN
April    16869.0
May      13785.0
dtype: float64

Here, columns ‘Feb’ & ‘March’ in dataframe have only 6 non-NaN values, so they didn’t satisfy our criteria of minimum non-NaN values. Therefore the sum of value in these columns was not calculated and NaN is used instead of that,

Ecample 5: Dataframe.sum() with a specific level in Multi-Index Dataframe

Suppose we have a Multi-Index Dataframe,

# List of Tuples
empSalary = [('jack',   'Delhi', 2000, 2010,    2050,   2134, 2111),
             ('Riti',   'Mumbai',3000, 3022,    3456,   3111, 2109),
             ('Aadi',   'Delhi', 4022, np.NaN,  2077,   2134, 3122),
             ('Mohit',  'Mumbai',3012, 3050,    2010,   2122, 1111),
             ('Veena',  'Delhi', 2023, 2232,    np.NaN, 2112, 1099),
             ('Shaun',  'Mumbai',2123, 2510,    3050,   3134, 2122),
             ('Mark',   'Mumbai',4000, 2000,    2050,   2122, 2111)
             ]

# Create a DataFrame object
emp_salary_df = pd.DataFrame(empSalary, columns=['Name', 'City', 'Jan', 'Feb', 'March', 'April', 'May'])
emp_salary_df.set_index(['Name', 'City'], inplace=True)

print(emp_salary_df)

Output:

               Jan     Feb   March  April   May
Name  City                                     
jack  Delhi   2000  2010.0  2050.0   2134  2111
Riti  Mumbai  3000  3022.0  3456.0   3111  2109
Aadi  Delhi   4022     NaN  2077.0   2134  3122
Mohit Mumbai  3012  3050.0  2010.0   2122  1111
Veena Delhi   2023  2232.0     NaN   2112  1099
Shaun Mumbai  2123  2510.0  3050.0   3134  2122
Mark  Mumbai  4000  2000.0  2050.0   2122  2111

Now we if we provide the level parameter then add the values for that particular level only. For example,

# Get sum of values for a level 'City' only
df = emp_salary_df.sum(level='City')

print('Summed up values for level "City": ')
print(df)

Output:

Summed up values for level "City": 
          Jan      Feb    March  April   May
City                                        
Delhi    8045   4242.0   4127.0   6380  6332
Mumbai  12135  10582.0  10566.0  10489  7453

Out Multi-Index dataframe had two levels i.e. ‘Name’ & ‘City’. We wanted to calculate the sum of values along the index/rows but for one level only i.e. ‘City’. So, we provided the ‘City’ as the level parameter, therefore it returned a Dataframe where index contains the unique values of the index ‘City’ from the original dataframe and columns contain the sum of column values for that particular level only.

Conclusion:

We can use dataframe.sum() to add the values in a dataframe along the different axis and levels. Other parameters in the sum() function gives a lot more control over its behavior.

Join a list of 2000+ Programmers for latest Tips & Tutorials