In this article, we will discuss how to delete the columns of a dataframe which contain all NaN values.

Table of Contents

We are going to use the pandas dropna() function. So, first let’s have a little overview of it,

Overview of dataframe.dropna()function

Pandas provide a function to delete rows or columns from a dataframe based on NaN values it contains.

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Arguments:

  • axis: Default – 0
    • 0, or ‘index’ : Drop rows which contain NaN values.
    • 1, or ‘columns’ : Drop columns which contain NaN value.
  • how: Default – ‘any’
    • ‘any’ : Drop rows / columns which contain any NaN values.
    • ‘all’ : Drop rows / columns which contain all NaN values.
  • thresh (int): Optional
    • Delete rows/columns which contains less than minimun thresh number of non-NaN values.
  • inplace (bool): Default- False
    • If True, modifies the calling dataframe object

Returns

  • If inplace==True, the return None, else returns a new dataframe by deleting the rows/columns based on NaN values.

Let’s use this to perform our task of deleting columns with all NaN values.

Pandas: Delete columns of dataframe if all NaN values

Suppose we have a dataframe that contains few columns with all NaN values,

      A   B   C       D   E   F   G     H    I
0  Jack NaN  34  Sydney NaN   5 NaN   NaN  NaN
1  Riti NaN  31   Delhi NaN   7 NaN   NaN  NaN
2  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
3  Mark NaN  41   Delhi NaN  12 NaN  11.0  1.0

Now we want to delete those columns from this dataframe which contains all NaN values (column ‘E’ and ‘G’). So, new dataframe should be like this,

      A   C       D   F     H    I
0  Jack  34  Sydney   5   NaN  NaN
1  Riti  31   Delhi   7   NaN  NaN
2  Aadi  16  London  11   3.0  NaN
3  Mark  41   Delhi  12  11.0  1.0

For this we can use a pandas dropna() function. It can delete the columns or rows of a dataframe that contains all or few NaN values. As we want to delete the columns that contains all NaN values, so we will pass following arguments in it,

# Drop columns which contain all NaN values
df = df.dropna(axis=1, how='all')
  • axis=1 : Drop columns which contain missing value.
  • how=’all’ : If all values are NaN, then drop those columns (because axis==1).

It returned a dataframe after deleting the columns with all NaN values and then we assigned that dataframe to the same variable.

Checkout complete example as follows,

import pandas as pd
import numpy as np

# List of Tuples
empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5,  np.NaN, np.NaN, np.NaN),
            ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7,  np.NaN, np.NaN, np.NaN),
            ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN),
            ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees,
                    columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'])

print("Contents of the Dataframe : ")
print(df)

# Drop columns which contain all NaN values
df = df.dropna(axis=1, how='all')

print("Modified Dataframe : ")
print(df)

Output:

Contents of the Dataframe :
      A   B   C       D   E   F   G     H    I
0  Jack NaN  34  Sydney NaN   5 NaN   NaN  NaN
1  Riti NaN  31   Delhi NaN   7 NaN   NaN  NaN
2  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
3  Mark NaN  41   Delhi NaN  12 NaN  11.0  1.0
Modified Dataframe :
      A   C       D   F     H    I
0  Jack  34  Sydney   5   NaN  NaN
1  Riti  31   Delhi   7   NaN  NaN
2  Aadi  16  London  11   3.0  NaN
3  Mark  41   Delhi  12  11.0  1.0

It deleted columns ‘E’ and ‘G’ of dataframe, because they had only NaN values.