Pandas: Replace NaN with mean or average in Dataframe using fillna()

In this article we will discuss how to replace the NaN values with mean of values in columns or rows using fillna() and mean() methods.

In data analytics we sometimes must fill the missing values using the column mean or row mean to conduct our analysis. Python provides users with built-in methods to rectify the issue of missing values or ‘NaN’ values and clean the data set. These functions are,

Dataframe.fillna() 

The fillna() method is used to replace the ‘NaN’ in the dataframe. We have discussed the arguments of fillna() in detail in another article.

The mean() method:

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Parameters:

  • axis : {index (0), columns (1)}
    • Axis for the function to be applied on.
  • skipna: bool, default True :
    • Exclude NA/null values when computing the result.
  • level: int or level name, default None:
    • If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
  • numeric_only: bool, default None Include only float, int, boolean columns.
    • If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
  • **kwargs: Additional keyword arguments to be passed to the function.

We will be using the default values of the arguments of the mean() method in this article.

Returns:

  • It returns the average or mean of the values.

Now let’s look at some examples of fillna() along with mean(),

Pandas: Replace NaN with column mean

We can replace the NaN values in a complete dataframe or a particular column with a mean of values in a specific column.

Suppose we have a dataframe that contains the information about 4 students S1 to S4 with marks in different subjects

import numpy as np
import pandas as pd

# A dictionary with list as values
sample_dict = { 'S1': [10, 20, np.NaN, np.NaN],
                'S2': [5, np.NaN, np.NaN, 29],
                'S3': [15, np.NaN, np.NaN, 11],
                'S4': [21, 22, 23, 25],
                'Subjects': ['Maths', 'Finance', 'History', 'Geography']}

# Create a DataFrame from dictionary
df = pd.DataFrame(sample_dict)
# Set column 'Subjects' as Index of DataFrame
df = df.set_index('Subjects')

print(df)

This is the DataFrame that we have created,

             S1    S2    S3  S4
Subjects                       
Maths      10.0   5.0  15.0  21
Finance    20.0   NaN   NaN  22
History     NaN   NaN   NaN  23
Geography   NaN  29.0  11.0  25

If we calculate the mean of values in ‘S2’ column, then a single value of float type is returned

# get mean of values in column S2
mean_value=df['S2'].mean()

print('Mean of values in column S2:')
print(mean_value)

Output:

Mean of values in column S2:
17.0

Replace NaN values in a column with mean of column values

Now let’s replace the NaN values in column S2 with mean of values in the same column i.e. S2,

# Replace NaNs in column S2 with the
# mean of values in the same column
df['S2'].fillna(value=df['S2'].mean(), inplace=True)

print('Updated Dataframe:')
print(df)

Output:

Updated Dataframe:
             S1    S2    S3  S4
Subjects                       
Maths      10.0   5.0  15.0  21
Finance    20.0  17.0   NaN  22
History     NaN  17.0   NaN  23
Geography   NaN  29.0  11.0  25

Since the mean() method is called by the ‘S2’ column, therefore value argument had the mean of the ‘S2’ column values. Then ‘NaN’ values in the ‘S2’ column got replaced with the value we got in the ‘value’ argument i.e. the mean of the ‘S2’ column.

Replace all NaN values in a Dataframe with mean of column values

Now if we want to change all the NaN values in the DataFrame with the mean of ‘S2’ we can simply call the fillna() function with the entire dataframe instead of a particular column name. Let me show you what I mean with the example,

# Replace all NaNs in a dataframe with
# mean of values in the a column
df.fillna(value=df['S2'].mean(), inplace=True)

print('Updated Dataframe:')
print(df)

Output:

Updated Dataframe:
             S1    S2    S3  S4
Subjects                       
Maths      10.0   5.0  15.0  21
Finance    20.0  17.0  17.0  22
History    17.0  17.0  17.0  23
Geography  17.0  29.0  11.0  25

Notice that all the values are replaced with the mean on ‘S2’ column values. In the above examples values we used the ‘inplace=True’ to make permanent changes in the dataframe.

We can even use the update() function to make the necessary updates.

df.update(df['S2'].fillna(value=df['S2'].mean(), inplace=True))

The above line will replace the NaNs in column S2 with the mean of values in column S2.

Pandas: Replace NANs with mean of multiple columns

Let’s reinitialize our dataframe with NaN values,

# Create a DataFrame from dictionary
df = pd.DataFrame(sample_dict)
# Set column 'Subjects' as Index of DataFrame
df = df.set_index('Subjects')

# Dataframe with NaNs
print(df)

Output

             S1    S2    S3  S4
Subjects                       
Maths      10.0   5.0  15.0  21
Finance    20.0   NaN   NaN  22
History     NaN   NaN   NaN  23
Geography   NaN  29.0  11.0  25

Now if we want to work on multiple columns together, we can just specify the list of columns while calling mean() function

# Mean of values in column S2 & S3
mean_values=df[['S2','S3']].mean()

print(mean_values)

Output:

S2    17.0
S3    13.0
dtype: float64

It returned a series containing 2 values i.e. mean of values in column S2 & S3.

Now let’s replace the NaN values in the columns ‘S2’ and ‘S3’ by the mean of values in ‘S2’ and ‘S3’ as returned by the mean() method. The ‘value’ attribute has a series of 2 mean values that fill the NaN values respectively in ‘S2’ and ‘S3’ columns. Here ‘value’ is of type ‘Series’,

# Replace the NaNs in column S2 & S3 by the mean of values
# in column S2 & S3 respectively
df[['S2','S3']] = df[['S2','S3']].fillna(value=df[['S2','S3']].mean())

print('Updated Dataframe:')
print(df)

Output:

Updated Dataframe:
             S1    S2    S3  S4
Subjects                       
Maths      10.0   5.0  15.0  21
Finance    20.0  17.0  13.0  22
History     NaN  17.0  13.0  23
Geography   NaN  29.0  11.0  25

Pandas: Replace NANs with row mean

We can fill the NaN values with row mean as well. Here the NaN value in ‘Finance’ row will be replaced with the mean of values in ‘Finance’ row. For this we need to use .loc(‘index name’) to access a row and then use fillna() and mean() methods. Here ‘value’ argument contains only 1 value i.e. mean of values in ‘History’ row value and is of type ‘float’

df.loc['History'] = df.loc['History'].fillna(value=df.loc['History'].mean())

print('Updated Dataframe:')
print(df)

Output:

Updated Dataframe:
                  S1    S2    S3    S4
Subjects                              
Maths      10.000000   5.0  15.0  21.0
Finance    20.000000  17.0  13.0  22.0
History    17.666667  17.0  13.0  23.0
Geography        NaN  29.0  11.0  25.0

Conclusion:

So, these were different ways to replace NaN values in a column, row or complete dataframe with mean or average values.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top