Get Column Average or Mean in Pandas

In this article, we will look at different ways to get the column average/mean in the pandas DataFrame. The pandas.DataFrame.mean() function has simplified the things, however, we will cover different scenarios in this tutorial.

Table of Contents

To quickly get started, let’s create a sample dataframe for experimentation. We’ll use the pandas library with some random data, say, the sales for each store at the month level.

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Store1', 20, 10, 5, 5),
            ('Store2', 69, 47, 37, 32),
            ('Store3', 33, 100, 121, 30),
            ('Store4', 47, 31, 22, 22),
            ('Store5', 89, 90, 11, 15)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Jan', 'Feb', 'March', 'April'],
                  index=[0, 1, 2, 3, 4])
print(df)

Contents of the created dataframe are,

     Name  Jan  Feb  March  April
0  Store1   20   10      5      5
1  Store2   69   47     37     32
2  Store3   33  100    121     30
3  Store4   47   31     22     22
4  Store5   89   90     11     15

Now, let’s look at different ways in which we could calculate the average/mean of the column.

Advertisements

Get average for all the columns in a DataFrame

To calculate the average for all the columns, we simply need to use the pd.DataFrame.mean() method. Let’s take a look at the code below.

# average for all the columns
print (df.mean(axis = 0))

Output

Jan      51.6
Feb      55.6
March    39.2
April    20.8
dtype: float64

Here you go, we have the average of all the columns displayed using the mean() function. We have used axis=0 to get the column-wise mean, for row-wise mean, we can use axis=1.

Get average for a specific column in DataFrame

In cases, where we need to get the average of just a specific column, we will again use the pd.column_name.mean() function as shown below.

# mean for Jan column
print (df.Jan.mean())

Output

51.6

Here you go, we have the mean displayed for the Jan column. If we want to include say “Feb” column in this, we can do that by subsetting both the columns together.

# mean for Jan and Feb column
print (df[['Jan', 'Feb']].mean())

Output

Jan    51.6
Feb    55.6
dtype: float64

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Store1', 20, 10, 5, 5),
            ('Store2', 69, 47, 37, 32),
            ('Store3', 33, 100, 121, 30),
            ('Store4', 47, 31, 22, 22),
            ('Store5', 89, 90, 11, 15)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Jan', 'Feb', 'March', 'April'],
                  index=[0, 1, 2, 3, 4])
print(df)

print('** Mean of All Columns **')

# average for all the columns
print (df.mean(axis = 0))


print('** Mean of a Column **')

# mean for Jan column
print (df.Jan.mean())

print('** Mean of multiple Columns **')

# mean for Jan and Feb column
print (df[['Jan', 'Feb']].mean())

Output:

     Name  Jan  Feb  March  April
0  Store1   20   10      5      5
1  Store2   69   47     37     32
2  Store3   33  100    121     30
3  Store4   47   31     22     22
4  Store5   89   90     11     15

** Mean of All Columns **

Jan      51.6
Feb      55.6
March    39.2
April    20.8
dtype: float64

** Mean of a Column **

51.6

** Mean of multiple Columns **

Jan    51.6
Feb    55.6
dtype: float64

Get average for column containing missing values

By default, pandas.DataFrame.mean() ignores the missing values (or NaNs) while calculating the averages across the row/column. In case, we want to change the default settings, we can use the skipna argument.

Let’s quickly add some missing values in the DataFrame to experiment.

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Store1', 20, 10, 5, np.NaN),
            ('Store2', 69, 47, 37, 32),
            ('Store3', 33, 100, np.NaN, 30),
            ('Store4', 47, 31, 22, 22),
            ('Store5', 89, 90, np.NaN, 15)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Jan', 'Feb', 'March', 'April'],
                  index=[0, 1, 2, 3, 4])
print(df)

Output

     Name  Jan  Feb  March  April
0  Store1   20   10    5.0    NaN
1  Store2   69   47   37.0   32.0
2  Store3   33  100    NaN   30.0
3  Store4   47   31   22.0   22.0
4  Store5   89   90    NaN   15.0

Let’s first try getting the average for all the columns as done previously.

# average for all the columns
print (df.mean(axis = 0))

Output

Jan      51.600000
Feb      55.600000
March    21.333333
April    24.750000
dtype: float64

In case, we don’t want to ignore NAs, we will use the skipna argument as below.

# average for all the columns without ignoring NAs
print (df.mean(axis = 0, skipna=False))

Output

Jan      51.6
Feb      55.6
March     NaN
April     NaN
dtype: float64

As observed, the average for the column “March” and “April” is NaN now, since it contains missing values.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Store1', 20, 10, 5, np.NaN),
            ('Store2', 69, 47, 37, 32),
            ('Store3', 33, 100, np.NaN, 30),
            ('Store4', 47, 31, 22, 22),
            ('Store5', 89, 90, np.NaN, 15)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Jan', 'Feb', 'March', 'April'],
                  index=[0, 1, 2, 3, 4])
print(df)

print('** Get average of all columns and skip NaN values **')

# average for all the columns
print (df.mean(axis = 0))

print('** Get average of all columns without skiping NaN values **')

# average for all the columns without ignoring NAs
print (df.mean(axis = 0, skipna=False))

Output:

     Name  Jan  Feb  March  April
0  Store1   20   10    5.0    NaN
1  Store2   69   47   37.0   32.0
2  Store3   33  100    NaN   30.0
3  Store4   47   31   22.0   22.0
4  Store5   89   90    NaN   15.0

** Get average of all columns and skip NaN values **

Jan      51.600000
Feb      55.600000
March    21.333333
April    24.750000
dtype: float64

** Get average of all columns without skiping NaN values **

Jan      51.6
Feb      55.6
March     NaN
April     NaN
dtype: float64

Summary

Great, you made it! In this article, we have discussed multiple ways to get the column average/mean in the pandas DataFrame. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top