In this article, we will discuss how to select dataframe rows which contains atleast one NaN value.

Suppose we have a dataframe like this,

        A     B   C       D     E   F     G     H     I
0    Jack   NaN  34  Sydney   NaN   5   NaN   NaN   NaN
1    Riti   NaN  31   Delhi   NaN   7   NaN   NaN   NaN
2    Aadi   NaN  16  London   NaN  11   NaN   3.0   NaN
3  Smriti  12.0  16  London  10.0  11   9.0   3.0  11.0
4   Shaun  23.0  18  London  11.0  12  13.0  13.0  14.0
5    Aadi   NaN  16  London   NaN  11   NaN   3.0   NaN
6    Mark   NaN  41   Delhi   NaN  12   NaN  11.0   1.0


From this dataframe, we want to select only those rows which contain one or more NaN values. Like this,

      A   B   C       D   E   F   G     H    I
0  Jack NaN  34  Sydney NaN   5 NaN   NaN  NaN
1  Riti NaN  31   Delhi NaN   7 NaN   NaN  NaN
2  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
5  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
6  Mark NaN  41   Delhi NaN  12 NaN  11.0  1.0

In pandas, using the isnull() and any() functions of the dataframe, we can do this in a single line i.e.

selected_rows = df[df.isnull().any(axis=1)]

It will return a dataframe containing only those rows, which contain any NaN value.

How did it work?

Although it is one line solution, but it is little hard to understand. So, let’s simplify this code into simple steps. It will help us understand what exactly is happening behind the scene.

Steps to select only those dataframe rows, which contain any NaN value

  • Step 1: Call the isnull() function on dataframe like df.isnull(). It will return a same sized bool dataframe containing only True or False values. True value indicates that there is a NaN at the corresponding position in the original dataframe and False indicates a Non-NaN value.
  • Step 2: Then call the any(axis=1) function on the bool dataframe like, df.isnull().any(axis=1). The any() function looks for any True value along the given axis. If axis==1, then it will look along the columns for each row. It means, for each row it will check all the column values and reduce it to a single value. For a row, if any column contains the NaN, then the reduced value for that row will be True. Therefore, it returns a bool Series, where each value represents a row of dataframe. If value is True, then it indicates that there is one or more NaN values in that row.
  • Step 3: Then pass this bool Series to [] operator of dataframe i.e. df[df.isnull().any(axis=1)]. It returns only those rows where bool Series has True value. It means it returns only those rows which contain any NaN value.

Let’s see a complete example,

import pandas as pd
import numpy as np

# List of Tuples
empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5,  np.NaN, np.NaN, np.NaN),
            ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7,  np.NaN, np.NaN, np.NaN),
            ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN),
            ('Smriti', 12 , 16, 'London', 10, 11, 9, 3, 11),
            ('Shaun', 23 , 18, 'London', 11, 12, 13, 13, 14),
            ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN),
            ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees,
                    columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'])

print("Contents of the Dataframe : ")
print(df)

# Select rows with atleast a NaN value
selected_rows = df[df.isnull().any(axis=1)]

print('Selected rows')
print(selected_rows)

Output:

Contents of the Dataframe : 
        A     B   C       D     E   F     G     H     I
0    Jack   NaN  34  Sydney   NaN   5   NaN   NaN   NaN
1    Riti   NaN  31   Delhi   NaN   7   NaN   NaN   NaN
2    Aadi   NaN  16  London   NaN  11   NaN   3.0   NaN
3  Smriti  12.0  16  London  10.0  11   9.0   3.0  11.0
4   Shaun  23.0  18  London  11.0  12  13.0  13.0  14.0
5    Aadi   NaN  16  London   NaN  11   NaN   3.0   NaN
6    Mark   NaN  41   Delhi   NaN  12   NaN  11.0   1.0

Selected rows
      A   B   C       D   E   F   G     H    I
0  Jack NaN  34  Sydney NaN   5 NaN   NaN  NaN
1  Riti NaN  31   Delhi NaN   7 NaN   NaN  NaN
2  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
5  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
6  Mark NaN  41   Delhi NaN  12 NaN  11.0  1.0

Here we selected only those dataframe rows which contain one or more NaN / missing value.

Using isna() instead of isnull()

We can achieve same things using isna() function of the dataframe. It is an alias of isnull(), so we can use the same logic i.e.

# Select rows with atleast a NaN value
selected_rows = df[df.isna().any(axis=1)]

print('Selected rows')
print(selected_rows)

Output:

Selected rows
      A   B   C       D   E   F   G     H    I
0  Jack NaN  34  Sydney NaN   5 NaN   NaN  NaN
1  Riti NaN  31   Delhi NaN   7 NaN   NaN  NaN
2  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
5  Aadi NaN  16  London NaN  11 NaN   3.0  NaN
6  Mark NaN  41   Delhi NaN  12 NaN  11.0  1.0

Summary:

We learned different ways to select only those rows from a dataframe which contains any NaN value in it.