In this article. we will discuss how to remove rows from a dataframe with missing value or NaN in any, all or few selected columns.
Table of Contents:
- Overview of DataFrame.dropna()
- Drop Rows with missing value / NaN in any column.
- Drop Rows in dataframe which has NaN in all columns.
- Drop Rows with any missing value in selected columns only.
- Drop Rows with missing values or NaN in all the selected columns.
- thresh Argument in the dropna() function
- Drop Rows with missing values from a Dataframe in place
Overview of DataFrame.dropna()
Python’s pandas library provides a function to remove rows or columns from a dataframe which contain missing values or NaN i.e.
DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)
Arguments :
- axis:
- 0 , to drop rows with missing values
- 1 , to drop columns with missing values
- how:
- ‘any’ : drop if any NaN / missing value is present
- ‘all’ : drop if all the values are missing / NaN
- thresh: threshold for non NaN values
- inplace: If True then make changes in the dataplace itself
It removes rows or columns (based on arguments) with missing values / NaN
Frequently Asked:
Let’s use dropna() function to remove rows with missing values in a dataframe,
Suppose we have a dataframe i.e.
import pandas as pd import numpy as np # List of Tuples empoyees = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , (np.NaN, np.NaN,'Delhi' , np.NaN) , ('Veena', 33, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Sam', 35, 'Colombo', 11), (np.NaN, np.NaN, np.NaN, np.NaN)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience']) print(df)
Contents of the dataframe,
Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 3 NaN NaN Delhi NaN 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0 7 NaN NaN NaN NaN
Drop Rows with missing value / NaN in any column
print("Contents of the Dataframe : ") print(df) # Drop rows which contain any NaN values mod_df = df.dropna() print("Modified Dataframe : ") print(mod_df)
Output:
Latest Python - Video Tutorial
Contents of the Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 3 NaN NaN Delhi NaN 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0 7 NaN NaN NaN NaN Modified Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0
It removed all the rows which had any missing value. It didn’t modified the original dataframe, it just returned a copy with modified contents.
How it worked ?
Default value of ‘how’ argument in dropna() is ‘any’ & for ‘axis’ argument it is 0. It means if we don’t pass any argument in dropna() then still it will delete all the rows with any NaN.
We can also pass the ‘how’ & ‘axis’ arguments explicitly too i.e.
# Drop rows which contain any NaN values mod_df = df.dropna( axis=0, how='any')
It will work similarly i.e. it will remove the rows with any missing value.
P.S. It returned a copy of original dataframe with modified contents.
Drop Rows in dataframe which has NaN in all columns
What if we want to remove rows in a dataframe, whose all values are missing i.e. NaN,
print("Contents of the Dataframe : ") print(df) # Drop rows which contain any NaN values mod_df = df.dropna( how='all') print("Modified Dataframe : ") print(mod_df)
Output:
Contents of the Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 3 NaN NaN Delhi NaN 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0 7 NaN NaN NaN NaN Modified Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 3 NaN NaN Delhi NaN 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0
It removes the rows in which all values were missing i.e. all columns contains NaN (only last row in above example). It didn’t modified the original dataframe, it just returned a copy with modified contents.
Drop Rows with any missing value in selected columns only
What if we want to remove rows in which values are missing in any of the selected column like, ‘Name’ & ‘Age’ columns, then we need to pass a subset argument containing the list column names.
print("Contents of the Dataframe : ") print(df) # Drop rows which contain any NaN value in the selected columns mod_df = df.dropna( how='any', subset=['Name', 'Age']) print("Modified Dataframe : ") print(mod_df)
Output:
Contents of the Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 3 NaN NaN Delhi NaN 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0 7 NaN NaN NaN NaN Modified Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0
It removes the rows which contains NaN in either of the subset columns i.e. either ‘Name’ or ‘Age’ column.
P.S. It returned a copy of original dataframe with modified contents.
Drop Rows with missing values or NaN in all the selected columns
What if we want to remove rows in which values are missing in all of the selected column i.e. ‘Name’ & ‘Age’ columns
print("Contents of the Dataframe : ") print(df) # Drop rows which contain all NaN values in the selected columns mod_df = df.dropna( how='all', subset=['Name', 'Age']) print("Modified Dataframe : ") print(mod_df)
Output:
Contents of the Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 3 NaN NaN Delhi NaN 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0 7 NaN NaN NaN NaN Modified Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0
It removes the rows which contains NaN in both the subset columns i.e. in above example both ‘Name’ or ‘Age’ columns.
P.S. It returned a copy of original dataframe with modified contents.
thresh Argument in the dropna() function
What if we want to remove the rows in a dataframe which contains less than n number of non NaN values ?
For this we can pass the n in thresh argument. For example,
Delete rows which contains less than 2 non NaN values
print("Contents of the Dataframe : ") print(df) mod_df = df.dropna( thresh=2 ) print("Modified Dataframe : ") print(mod_df)
Output:
Contents of the Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 3 NaN NaN Delhi NaN 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0 7 NaN NaN NaN NaN Modified Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0
Drop Rows with missing values from a Dataframe in place
In the examples which we saw till now, dropna() returns a copy of the original dataframe with modified contents.
What if we want to drop rows with missing values in existing dataframe ?
Let’s see how to make changes in dataframe in place i.e.
print("Contents of the Dataframe : ") print(df) df.dropna( how='any', inplace=True) print("Modified Dataframe : ") print(df)
Output:
Contents of the Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 2 Aadi 16.0 NaN 11.0 3 NaN NaN Delhi NaN 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0 7 NaN NaN NaN NaN Modified Dataframe : Name Age City Experience 0 jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi 7.0 4 Veena 33.0 Delhi 4.0 5 Shaunak 35.0 Mumbai 5.0 6 Sam 35.0 Colombo 11.0
As we passed the inplace argument as True. So, it modified the dataframe in place and removed rows from it which had any missing value.
Latest Video Tutorials