In this article, we will discuss how to select dataframe columns which contains the NaN values (any, all or None).

Table of contents:

Let’s first create a dataframe and then we will see how to select columns from it based on the NaN values,

import pandas as pd
import numpy as np

# List of Tuples
empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5,  np.NaN, np.NaN, np.NaN),
            ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7,  np.NaN, np.NaN, np.NaN),
            ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN),
            ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees,
                    columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'])

print("Contents of the Dataframe : ")
print(df)

Output:

Contents of the Dataframe : 
      A   B   C       D   E   F   G     H    I
0  Jack NaN  34  Sydney NaN   5 NaN   NaN  NaN
1  Riti NaN  31   Delhi NaN   7 NaN   NaN  NaN
2  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
3  Mark NaN  41   Delhi NaN  12 NaN  11.0  1.0

Now let’s see how to select columns from the above dataframe containing NaN values,

Pandas: Select dataframe columns with any NaN values.

To select the columns with any NaN value, use the loc[] attribute of the dataframe i.e.

loc[row_section, column_section]
  • row_section: In the row_section pass ‘:’ to include all rows.
  • column_section: In the column section pass a bool series, which should be of same size as number of columns of the dataframe. If any value in bool series is True then it means that column contains any NaN value.

For example,

# Select dataframe columns with any NaN value
subset_df = df.loc[:, df.isnull().any()]

print(subset_df)

Output

    B   E   G     H    I
0 NaN NaN NaN   NaN  NaN
1 NaN NaN NaN   NaN  NaN
2 NaN NaN NaN   3.0  NaN
3 NaN NaN NaN  11.0  1.0

It returned a dataframe with only those columns from the original dataframe, which contains any NaN value.
This one-liner solution seems a little complex. So, let’s break this down a little to understand how it is works.

Why did we passed df.isnull().any() in the column section of loc[]?

df.isnull().any() returns a bool series and each value in the bool series represents a column. If any value in bool series is True then it means that corresponding column has any NaN value in it. By passing this in the column_section of loc[], we made sure that loc[] selects only the columns containing any NaN value.

How df.isnull().any() works?

First of all, call isNull() function on the dataframe object, it returns a same sized dataframe containing only bool values. In this bool dataframe, a cell contains True if the corresponding cell in the original dataframe is NaN, otherwise it contains False. Then call any() function on this Boolean dataframe object. It looks over the column axis and returns a bool series. Each value in the bool series represents a column and if value is True then it means that column has any NaN value.

Then pass that bool series to the column section of loc[], it selects only those dataframe columns which has any NaN value.

Pandas: Select dataframe columns with all NaN values.

To select the columns with all NaN value, use the loc[] attribute of the dataframe i.e.

loc[row_section, column_section]
  • row_section: In the row_section pass ‘:’ to include all rows.
  • column_section: In the column section pass a bool series, which should be of same size as number of columns of the dataframe. If all value in bool series is True then it means, that respective column contains all NaN values.

For example,

# Select dataframe columns with all NaN values
subset_df = df.loc[:, df.isnull().all()]

print(subset_df)

Output

    B   E   G
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN

It returned a dataframe with only those columns from the original dataframe, which contains only NaN values.
This one-liner solution seems a little complex. So, let’s break this down a little to understand how it is works.

Why did we passed df.isnull().all() in the column section of loc[]?

df.isnull().all() returns a bool series and each value in the bool series represents a column. If all value in bool series is True then it means that corresponding column has all NaN value in it. By passing this in the column_section of loc[], we made sure that loc[] selects only the columns containing only NaN values.

How df.isnull().all() works?

First of all, call isNull() function on the dataframe object, it returns a same sized dataframe containing only bool values. In this bool dataframe, a cell contains True if the corresponding cell in the original dataframe is NaN, otherwise it contains False. Then call all() function on this Boolean dataframe object. It looks over the column axis and returns a bool series. Each value in the bool series represents a column and if value is True then it means that column has only NaN values in it.

Then pass that bool series to the column section of loc[], it selects only those dataframe columns which has all NaN values.

Pandas: Select all columns without NaN

To select only those columns from dataframe which do not contain any NaN value, use the loc[] attribute of the dataframe i.e.

loc[row_section, column_section]
  • row_section: In the row_section pass ‘:’ to include all rows.
  • column_section: In the column section pass a bool series, which should be of same size as number of columns of the dataframe. If all value in bool series is True then it means that column has only non-NaN values.

For example,

# Select columns without a single NaN value
subset_df = df.loc[:, ~df.isnull().any()]

print(subset_df)

Output

      A   C       D   F
0  Jack  34  Sydney   5
1  Riti  31   Delhi   7
2  Aadi  16  London  11
3  Mark  41   Delhi  12

It returned a dataframe with only those columns from the original dataframe, which contains all non-NaN values.
This one-liner solution seems a little complex. So, let’s break this down a little to understand how it is works.

Why did we passed ~df.isnull().any() in the column section of loc[]?

~df.isnull().any() returns a bool series and each value in the bool series represents a column. If any value in bool series is True then it means that corresponding column has no NaN value in it. By passing this in the column_section of loc[], we made sure that loc[] selects only the columns containing all non NaN values.

How ~df.isnull().any() works?

First of all, call isNull() function on the dataframe object, it returns a same sized dataframe containing only bool values. In this bool dataframe, a cell contains True if the corresponding cell in the original dataframe is NaN, otherwise it contains False. Then call any() function on this Boolean dataframe object. It looks over the column axis and returns a bool series. Each value in the bool series represents a column and if value is True then it means that column has atleast NaN value. Then using the ~ operator, we inverted the bool series and if any value is True in series, then it means that column has no NaN value in it.

Then pass that bool series to the column section of loc[], it selects only those dataframe columns which has all non NaN values.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5,  np.NaN, np.NaN, np.NaN),
            ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7,  np.NaN, np.NaN, np.NaN),
            ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN),
            ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees,
                    columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'])

print("Contents of the Dataframe : ")
print(df)

# Select dataframe columns with any NaN value
subset_df = df.loc[:, df.isnull().any()]

print(subset_df)


# Select dataframe columns with all NaN values
subset_df = df.loc[:, df.isnull().all()]

print(subset_df)


# Select columns without a single NaN value
subset_df = df.loc[:, ~df.isnull().any()]

print(subset_df)

Output:

Contents of the Dataframe : 
      A   B   C       D   E   F   G     H    I
0  Jack NaN  34  Sydney NaN   5 NaN   NaN  NaN
1  Riti NaN  31   Delhi NaN   7 NaN   NaN  NaN
2  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
3  Mark NaN  41   Delhi NaN  12 NaN  11.0  1.0
    B   E   G     H    I
0 NaN NaN NaN   NaN  NaN
1 NaN NaN NaN   NaN  NaN
2 NaN NaN NaN   3.0  NaN
3 NaN NaN NaN  11.0  1.0
    B   E   G
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
      A   C       D   F
0  Jack  34  Sydney   5
1  Riti  31   Delhi   7
2  Aadi  16  London  11
3  Mark  41   Delhi  12