This tutorial will discuss about different ways to replace NaN with zero in multiple columns in pandas.
Table Of Contents
Introduction
Suppose we have a DataFrame with several missing or NaN values, in almost all the columns. Like this,
Name Level_1 Score Level_2 Score 0 Mark 123.0 789.0 1 NaN NaN NaN 2 NaN 132.0 NaN 3 Shreya 789.0 NaN 4 Aadi NaN 666.0 5 Sim 890.0 NaN
We want to replace NaN values with 0, in some specific columns of this DataFrame. For example, in columns, “Level_1 Score” and “Level_2 Score” only. Like this,
Name Level_1 Score Level_2 Score 0 Mark 123.0 789.0 1 NaN 0.0 0.0 2 NaN 132.0 0.0 3 Shreya 789.0 0.0 4 Aadi 0.0 666.0 5 Sim 890.0 0.0
We don’t want to touch the remaining columns. Let’s see how to this
Preparing DataSet
Let’s create a DataFrame with missing or NaN values scattered accross all columns.
import pandas as pd import numpy as np # List of Tuples players = [ ('Mark', 123, 789), ( np.NaN, np.NaN, np.NaN), ( np.NaN, 132, np.NaN), ('Shreya', 789, np.NaN), ('Aadi', np.NaN, 666), ('Sim', 890, np.NaN)] # Create a DataFrame object from list of tuples df = pd.DataFrame(players, columns=['Name', 'Level_1 Score', 'Level_2 Score']) print(df)
Output
Frequently Asked:
Name Level_1 Score Level_2 Score 0 Mark 123.0 789.0 1 NaN NaN NaN 2 NaN 132.0 NaN 3 Shreya 789.0 NaN 4 Aadi NaN 666.0 5 Sim 890.0 NaN
Now Let’s replace NaN values with 0 in multiple columns i.e. in columns ‘Level_1 Score’ and ‘Level_2 Score’.
Using apply() to replace NaN with zero in multiple columns
Steps are,
- Select the specified columns from DataFrame as a sub DataFrame. The call the
apply()
function on it. - Pass a lambda function in the apply() function. It will apply this function on each column of the sub DataFrame i.e. on selected columns from the main DataFrame.
- Inside this Lambda Function, replace all the NaN values in the given column with zero.
- Assign these modified copy of columns, back to the original DataFrame.
Syntax is:
df[['column_name_1','column_name2']] = \ df[['column_name_1','column_name2']].apply(lambda col: col.replace(np.NaN, 0))
Let’s use this logic to replace all the NaNs values in columns ‘Level_1 Score’ and ‘Level_2 Score’ with zero.
# replace all NaN values in Columns # 'Level_1 Score' and 'Level_2 Score' with zero df[['Level_1 Score', 'Level_2 Score']] = \ df[['Level_1 Score', 'Level_2 Score']].apply(lambda x: x.replace(np.nan, 0)) print(df)
Output
Name Level_1 Score Level_2 Score 0 Mark 123.0 789.0 1 NaN 0.0 0.0 2 NaN 132.0 0.0 3 Shreya 789.0 0.0 4 Aadi 0.0 666.0 5 Sim 890.0 0.0
All NaN values in multiple columns i.e. ‘Level_1 Score’ and ‘Level_2 Score’ are replaced by 0.
Summary
We learned about a way to replace NaN values with zeros in multiple columns Columns of a DataFrame. Thanks.