In this article we will discuss how to remove rows from a dataframe with missing value or NaN in any, all or few selected columns.

DataFrame.dropna()

Python’s pandas library provides a function to remove rows or columns from a dataframe which contain missing values or NaN i.e.

DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)

Arguments :

  • axis:
    • 0 , to drop rows with missing values
    • 1 , to drop columns with missing values
  • how:
    • ‘any’ : drop if any NaN / missing value is present
    • ‘all’ : drop if all the values are missing / NaN
  • thresh: threshold for non NaN values
  • inplace: If True then make changes in the dataplace itself

It removes rows or columns (based on arguments) with missing values / NaN

Let’s use dropna() function to remove rows with missing values in a dataframe,

Suppose we have a dataframe i.e.

# List of Tuples
empoyees = [('jack', 34, 'Sydney', 5) ,
         ('Riti', 31, 'Delhi' , 7) ,
         ('Aadi', 16, np.NaN, 11) ,
         (np.NaN, np.NaN,'Delhi' , np.NaN) ,
         ('Veena', 33, 'Delhi' , 4) ,
         ('Shaunak', 35, 'Mumbai', 5 ),
         ('Sam', 35, 'Colombo', 11),
         (np.NaN, np.NaN, np.NaN, np.NaN)
          ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'])

Contents of the dataframe,
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
7      NaN   NaN      NaN         NaN

Drop Rows with missing value / NaN in any column

#Drop rows which contains any NaN or missing value
modDf = empDfObj.dropna()

print("Contents of the Modified Dataframe : ")
print(modDf)

Output:
Contents of the Modified Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

It removed all the rows which had any missing value. It didn’t modified the original dataframe, it just returned a copy with modified contents.

How it worked ?
Default value of ‘how’ argument in dropna() is ‘any’ & for ‘axis’ argument it is 0. It means if we don’t pass any argument in dropna() then still it will delete all the rows with any NaN.

We can also pass the ‘how’ & ‘axis’ arguments explicitly too i.e.

#Drop rows which contains any NaN or missing value
modDf = empDfObj.dropna(how='any')

It will work similarly i.e. it will remove the rows with any missing value.

P.S. It returned a copy of original dataframe with modified contents

Drop Rows in dataframe which has NaN in all columns

What if we want to remove rows in a dataframe, whose all values are missing i.e. NaN,

# Drop rows with NaN or missing value in all columns
modDf = empDfObj.dropna(how='all')

print("Contents of the Modified Dataframe : ")
print(modDf)

Output:
Contents of the Modified Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

It removes the rows in which all values were missing i.e. all columns contains NaN. It didn’t modified the original dataframe, it just returned a copy with modified contents.

Drop Rows with any missing value in selected columns only

What if we want to remove rows in which values are missing in any of the selected column i.e. ‘Name’ & ‘Age’ columns

# Drop rows with any NaN in the selected columns only
modDf = empDfObj.dropna(how='any', subset=['Name', 'Age'])

print("Contents of the Modified Dataframe : ")
print(modDf)

Output:
Contents of the Modified Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

It removes the rows which contains NaN in either ‘Name’ or ‘Age’ column.

P.S. It returned a copy of original dataframe with modified contents

Drop Rows with missing values or NaN in all the selected columns

What if we want to remove rows in which values are missing in all of the selected column i.e. ‘Name’ & ‘Age’ columns

# Drop rows if all the selected columns contains NaN only i.e.
modDf = empDfObj.dropna(how='all', subset=['Name', 'Age'])

print("Contents of the Modified Dataframe : ")
print(modDf)

Output:
Contents of the Modified Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

It removes the rows which contains NaN in both the ‘Name’ or ‘Age’ columns.

P.S. It returned a copy of original dataframe with modified contents

thresh Argument in the dropna() function

What if we want to remove the rows in a dataframe which contains less than n number of non NaN values ?
For this we can pass the n in thresh argument. For example,

Delete rows which contains less than 2 non NaN values

# Delete rows which contains less than 2 non NaN values
modObj = empDfObj.dropna(thresh=2)

print("Contents of the Dataframe : ")
print(modObj)

Output:
Contents of the Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

Drop Rows with missing values from a Dataframe in place

In the examples which we saw till now, dropna() returns a copy of the original dataframe with modified contents.
What if we want to drop rows with missing values in existing dataframe ?
Let’s see how to make changes in dataframe in place i.e.

empDfObj.dropna(how='any', inplace=True)

print("Contents of the Dataframe : ")
print(empDfObj)

Output:
Contents of the Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

As we passed the inplace argument as True. So, it modified the dataframe in place and removed rows from it which had any missing value.

Complete example is as follows,

import pandas as pd
import numpy as np

def main():

    # List of Tuples
    empoyees = [('jack', 34, 'Sydney', 5) ,
             ('Riti', 31, 'Delhi' , 7) ,
             ('Aadi', 16, np.NaN, 11) ,
             (np.NaN, np.NaN,'Delhi' , np.NaN) ,
             ('Veena', 33, 'Delhi' , 4) ,
             ('Shaunak', 35, 'Mumbai', 5 ),
             ('Sam', 35, 'Colombo', 11),
             (np.NaN, np.NaN, np.NaN, np.NaN)
              ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'])

    print("Contents of the Dataframe : ")
    print(empDfObj)

    print('*** Drop Rows which contains missing value / NaN in any column ***')

    #Drop rows which contains any NaN or missing value
    modDf = empDfObj.dropna()

    print("Contents of the Modified Dataframe : ")
    print(modDf)

    #Drop rows which contains any NaN or missing value
    modDf = empDfObj.dropna(how='any')

    print("Contents of the Modified Dataframe : ")
    print(modDf)

    print('*** Drop Rows which contains missing value / NaN in all columns ***')

    # Drop rows with NaN or missing value in all columns
    modDf = empDfObj.dropna(how='all')

    print("Contents of the Modified Dataframe : ")
    print(modDf)

    print('*** Drop Rows with any missing value or any NaN in the selected columns only ***')

    # Drop rows with any NaN in the selected columns only
    modDf = empDfObj.dropna(how='any', subset=['Name', 'Age'])

    print("Contents of the Modified Dataframe : ")
    print(modDf)

    print('*** Drop Rows with missing values in selected columns only ***')

    # Drop rows if selected columns contains NaN only i.e.
    modDf = empDfObj.dropna(how='all', subset=['Name', 'Age'])

    print("Contents of the Modified Dataframe : ")
    print(modDf)

    print('*** Drop Rows with NaN greater than a threashold  ***')

    # Delete rows which contains less than 2 non NaN values
    modObj = empDfObj.dropna(thresh=2)

    print("Contents of the Dataframe : ")
    print(modObj)

    print('*** Drop Rows with any NaN, changes are in place ***')

    empDfObj.dropna(how='any', inplace=True)

    print("Contents of the Dataframe : ")
    print(empDfObj)

if __name__ == '__main__':
    main()

Output:
Contents of the Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
7      NaN   NaN      NaN         NaN
*** Drop Rows which contains missing value / NaN in any column ***
Contents of the Modified Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
Contents of the Modified Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
*** Drop Rows which contains missing value / NaN in all columns ***
Contents of the Modified Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
3      NaN   NaN    Delhi         NaN
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
*** Drop Rows with any missing value or any NaN in the selected columns only ***
Contents of the Modified Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
*** Drop Rows with missing values in selected columns only ***
Contents of the Modified Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
*** Drop Rows with NaN greater than a threashold  ***
Contents of the Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
2     Aadi  16.0      NaN        11.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0
*** Drop Rows with any NaN, changes are in place ***
Contents of the Dataframe : 
      Name   Age     City  Experience
0     jack  34.0   Sydney         5.0
1     Riti  31.0    Delhi         7.0
4    Veena  33.0    Delhi         4.0
5  Shaunak  35.0   Mumbai         5.0
6      Sam  35.0  Colombo        11.0

Join a list of 2000+ Programmers for latest Tips & Tutorials