Pandas : Drop Rows with NaN or Missing values

In this article. we will discuss how to remove rows from a dataframe with missing value or NaN in any, all or few selected columns.

Table of Contents:

Overview of DataFrame.dropna()

Python’s pandas library provides a function to remove rows or columns from a dataframe which contain missing values or NaN i.e.

DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)

Arguments :

  • axis:
    • 0 , to drop rows with missing values
    • 1 , to drop columns with missing values
  • how:
    • ‘any’ : drop if any NaN / missing value is present
    • ‘all’ : drop if all the values are missing / NaN
  • thresh: threshold for non NaN values
  • inplace: If True then make changes in the dataplace itself

It removes rows or columns (based on arguments) with missing values / NaN

Let’s use dropna() function to remove rows with missing values in a dataframe,

Suppose we have a dataframe i.e.

import pandas as pd
import numpy as np

# List of Tuples
empoyees = [('jack', 34, 'Sydney', 5) ,
            ('Riti', 31, 'Delhi' , 7) ,
            ('Aadi', 16, np.NaN, 11) ,
            (np.NaN, np.NaN,'Delhi' , np.NaN) ,
            ('Veena', 33, 'Delhi' , 4) ,
            ('Shaunak', 35, 'Mumbai', 5 ),
            ('Sam', 35, 'Colombo', 11),
            (np.NaN, np.NaN, np.NaN, np.NaN)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees, 
                    columns=['Name', 'Age', 'City', 'Experience'])

print(df)

Contents of the dataframe,

      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
7      NaN   NaN      NaN         NaN

Drop Rows with missing value / NaN in any column

print("Contents of the Dataframe : ")
print(df)

# Drop rows which contain any NaN values
mod_df = df.dropna()

print("Modified Dataframe : ")
print(mod_df)

Output:

Contents of the Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
7      NaN   NaN      NaN         NaN
Modified Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

It removed all the rows which had any missing value. It didn’t modified the original dataframe, it just returned a copy with modified contents.

How it worked ?
Default value of ‘how’ argument in dropna() is ‘any’ & for ‘axis’ argument it is 0. It means if we don’t pass any argument in dropna() then still it will delete all the rows with any NaN.

We can also pass the ‘how’ & ‘axis’ arguments explicitly too i.e.

# Drop rows which contain any NaN values
mod_df = df.dropna( axis=0, 
                    how='any')

It will work similarly i.e. it will remove the rows with any missing value.

P.S. It returned a copy of original dataframe with modified contents.

Drop Rows in dataframe which has NaN in all columns

What if we want to remove rows in a dataframe, whose all values are missing i.e. NaN,

print("Contents of the Dataframe : ")
print(df)

# Drop rows which contain any NaN values
mod_df = df.dropna( how='all')

print("Modified Dataframe : ")
print(mod_df)

Output:

Contents of the Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
7      NaN   NaN      NaN         NaN
Modified Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

It removes the rows in which all values were missing i.e. all columns contains NaN (only last row in above example). It didn’t modified the original dataframe, it just returned a copy with modified contents.

Drop Rows with any missing value in selected columns only

What if we want to remove rows in which values are missing in any of the selected column like, ‘Name’ & ‘Age’ columns, then we need to pass a subset argument containing the list column names.

print("Contents of the Dataframe : ")
print(df)

# Drop rows which contain any NaN value in the selected columns
mod_df = df.dropna( how='any',
                    subset=['Name', 'Age'])

print("Modified Dataframe : ")
print(mod_df)

Output:

Contents of the Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
7      NaN   NaN      NaN         NaN
Modified Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

It removes the rows which contains NaN in either of the subset columns i.e. either ‘Name’ or ‘Age’ column.

P.S. It returned a copy of original dataframe with modified contents.

Drop Rows with missing values or NaN in all the selected columns

What if we want to remove rows in which values are missing in all of the selected column i.e. ‘Name’ & ‘Age’ columns

print("Contents of the Dataframe : ")
print(df)

# Drop rows which contain all NaN values in the selected columns
mod_df = df.dropna( how='all',
                    subset=['Name', 'Age'])

print("Modified Dataframe : ")
print(mod_df)

Output:

Contents of the Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
7      NaN   NaN      NaN         NaN
Modified Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

It removes the rows which contains NaN in both the subset columns i.e. in above example both ‘Name’ or ‘Age’ columns.

P.S. It returned a copy of original dataframe with modified contents.

thresh Argument in the dropna() function

What if we want to remove the rows in a dataframe which contains less than n number of non NaN values ?
For this we can pass the n in thresh argument. For example,

Delete rows which contains less than 2 non NaN values

print("Contents of the Dataframe : ")
print(df)

mod_df = df.dropna( thresh=2 )

print("Modified Dataframe : ")
print(mod_df)

Output:

Contents of the Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
7      NaN   NaN      NaN         NaN
Modified Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

Drop Rows with missing values from a Dataframe in place

In the examples which we saw till now, dropna() returns a copy of the original dataframe with modified contents.
What if we want to drop rows with missing values in existing dataframe ?
Let’s see how to make changes in dataframe in place i.e.

print("Contents of the Dataframe : ")
print(df)

df.dropna( how='any', inplace=True)

print("Modified Dataframe : ")
print(df)

Output:

Contents of the Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
7      NaN   NaN      NaN         NaN
Modified Dataframe :
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

As we passed the inplace argument as True. So, it modified the dataframe in place and removed rows from it which had any missing value.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top