Pandas: Drop dataframe columns if any NaN / Missing value

In this article, we will discuss how to delete the columns of a dataframe which contain atleast a NaN value. We can also say that, we are going to delete those dataframe columns which contain one or more missing values.

Table of Contents

We are going to use the pandas dropna() function. So, first let’s have a little overview of it,

Overview of dataframe.dropna()function

Pandas provide a function to delete rows or columns from a dataframe based on NaN values it contains.

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Arguments:

  • axis: Default – 0
    • 0, or ‘index’ : Drop rows which contain NaN values.
    • 1, or ‘columns’ : Drop columns which contain NaN value.
  • how: Default – ‘any’
    • ‘any’ : Drop rows / columns which contain any NaN values.
    • ‘all’ : Drop rows / columns which contain all NaN values.
  • thresh (int): Optional
    • Delete rows/columns which contains less than minimun thresh number of non-NaN values.
  • inplace (bool): Default- False
    • If True, modifies the calling dataframe object

Returns

  • If inplace==True, the return None, else returns a new dataframe by deleting the rows/columns based on NaN values.

Let’s use this to perform our task of deleting columns with all NaN values.

Pandas: Delete dataframe columns containing any NaN value

Suppose we have a dataframe that contains few columns which has one or more than one NaN values,

      A   B   C       D   E   F   G     H    I
0  Jack NaN  34  Sydney NaN   5 NaN   NaN  NaN
1  Riti NaN  31   Delhi NaN   7 NaN   NaN  NaN
2  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
3  Mark NaN  41   Delhi NaN  12 NaN  11.0  1.0

Now we want to delete those dataframe columns which contain any NaN values (column ‘B’, ‘E’, ‘G’, ‘H’ and ‘I’). So, new dataframe should be like this,

      A   C       D   F
0  Jack  34  Sydney   5
1  Riti  31   Delhi   7
2  Aadi  16  London  11
3  Mark  41   Delhi  12

For this we can use a pandas dropna() function. It can delete the columns or rows of a dataframe that contains all or few NaN values. As we want to delete the columns that contains at least a NaN value, so we will pass following arguments in it,

# Drop columns which contain one or more NaN values
df = df.dropna(axis=1, how='any')
  • axis=1 : Drop columns which contain missing value.
  • how=’any’ : If any value is NaN, then drop those columns (because axis==1).

It returned a dataframe after deleting the columns with one or more NaN values and then we assigned that dataframe to the same variable.

Checkout complete example as follows,

import pandas as pd
import numpy as np

# List of Tuples
empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5,  np.NaN, np.NaN, np.NaN),
            ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7,  np.NaN, np.NaN, np.NaN),
            ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN),
            ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees,
                    columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'])

print("Contents of the Dataframe : ")
print(df)

# Drop columns containing any NaN value 
df = df.dropna(axis=1, how='any')

print("Modified Dataframe : ")
print(df)

Output:

      A   B   C       D   E   F   G     H    I
0  Jack NaN  34  Sydney NaN   5 NaN   NaN  NaN
1  Riti NaN  31   Delhi NaN   7 NaN   NaN  NaN
2  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
3  Mark NaN  41   Delhi NaN  12 NaN  11.0  1.0
Modified Dataframe : 
      A   C       D   F
0  Jack  34  Sydney   5
1  Riti  31   Delhi   7
2  Aadi  16  London  11
3  Mark  41   Delhi  12

It deleted columns ‘B’, ‘E’, ‘G’, ‘H’ and ‘I’ of dataframe, because they had at least a NaN value.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top