Find Rows with NaN in Pandas

In this article, we will discuss different ways to find rows with NaN in columns of a Pandas Dataframe.

Pandas – Find Rows with NaN in a specified column

Suppose we have a dataframe like this,

        A     B   C       D     E   F     G     H     I
0    Jack   NaN  34  Sydney   NaN   5   NaN   NaN   NaN
1    Riti   NaN  31   Delhi   NaN   7   NaN   NaN   NaN
2    Aadi   NaN  16  London   NaN  11   NaN   3.0   NaN
3  Smriti  12.0  16  London  10.0  11   9.0   NaN  11.0
4   Shaun  23.0  18  London  11.0  12  13.0  13.0  14.0
5    Aadi   NaN  16  London   NaN  11   NaN   3.0   NaN
6    Mark   NaN  41   Delhi   NaN  12   NaN  11.0   1.0

We want to select only those rows from this dataframe, where column ‘H’ has NaN value i.e.

        A     B   C       D     E   F    G   H     I
0    Jack   NaN  34  Sydney   NaN   5  NaN NaN   NaN
1    Riti   NaN  31   Delhi   NaN   7  NaN NaN   NaN
3  Smriti  12.0  16  London  10.0  11  9.0 NaN  11.0

To do that we need only a single line of code i.e.

selected_rows = df[df['H'].isnull()]

It will return a dataframe containing only those rows where column ‘H’ contains the NaN values.

How did it work?

Although it is one line code but it can be little tricky to understand. So, let’s break this code into simple steps. It will help us understand what is actually happening behind the scenes.

Steps to select only those rows from a dataframe, where a given column contains the NaN values are as follows,

  • Step 1: Select the dataframe column ‘H’ as a Series using the [] operator i.e. df[‘H’].
  • Step 2: Then Call the isnull() function of Series object like df[‘H’].isnull(). It returns a same sized bool series containing True or False. A True value in bool Series indicates that corresponding value in df[‘H’] is NaN, whereas False indicates the non NaN value.
  • Step 3: Pass this bool series to [] operator of dataframe like df[df[‘H’].isnull()]. It will return only those rows from dataframe where the corresponding value in bool series is True.

Let’s see a complete example,

import pandas as pd
import numpy as np

# List of Tuples
empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5,  np.NaN, np.NaN, np.NaN),
            ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7,  np.NaN, np.NaN, np.NaN),
            ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN),
            ('Smriti', 12 , 16, 'London', 10, 11, 9, np.NaN, 11),
            ('Shaun', 23 , 18, 'London', 11, 12, 13, 13, 14),
            ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN),
            ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees,
                    columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'])

print("Contents of the Dataframe : ")
print(df)


# Select rows where column 'H' has NaN value
selected_rows = df[df['H'].isnull()]

print('Selected rows')
print(selected_rows)

Output:

Contents of the Dataframe : 
        A     B   C       D     E   F     G     H     I
0    Jack   NaN  34  Sydney   NaN   5   NaN   NaN   NaN
1    Riti   NaN  31   Delhi   NaN   7   NaN   NaN   NaN
2    Aadi   NaN  16  London   NaN  11   NaN   3.0   NaN
3  Smriti  12.0  16  London  10.0  11   9.0   NaN  11.0
4   Shaun  23.0  18  London  11.0  12  13.0  13.0  14.0
5    Aadi   NaN  16  London   NaN  11   NaN   3.0   NaN
6    Mark   NaN  41   Delhi   NaN  12   NaN  11.0   1.0

Selected rows
        A     B   C       D     E   F    G   H     I
0    Jack   NaN  34  Sydney   NaN   5  NaN NaN   NaN
1    Riti   NaN  31   Delhi   NaN   7  NaN NaN   NaN
3  Smriti  12.0  16  London  10.0  11  9.0 NaN  11.0

Select dataframe rows with NaN in a specified column using isna()

In pandas isna() function of Series is an alias of isnull(). So, you can use this also to select the rows with NaN in a specified column i.e.

# Select rows where column 'H' has NaN value
selected_rows = df[df['H'].isna()]

print('Selected rows')
print(selected_rows)

It is exactly similar to previous solution becuase isna() is an alias of isnull().

Related Posts:

Select Dataframe Rows with NaN in multiple columns

Suppose we have a dataframe like this,

        A     B   C       D     E   F     G     H     I
0    Jack   NaN  34  Sydney   NaN   5   NaN   NaN   NaN
1    Riti   NaN  31   Delhi   NaN   7   NaN   NaN   NaN
2    Aadi   NaN  16  London   NaN  11   NaN   3.0   NaN
3  Smriti  12.0  16  London  10.0  11   9.0   NaN  11.0
4   Shaun  23.0  18  London  11.0  12  13.0  13.0  14.0
5    Aadi   NaN  16  London   NaN  11   NaN   3.0   NaN
6    Mark   NaN  41   Delhi   NaN  12   NaN  11.0   1.0

We want to select only those rows from this dataframe, where columns ‘H’ and ‘I’ have NaN values i.e.

      A   B   C       D   E  F   G   H   I
0  Jack NaN  34  Sydney NaN  5 NaN NaN NaN
1  Riti NaN  31   Delhi NaN  7 NaN NaN NaN

This can be done in a single line of code i.e.

# Select rows where column 'H' & 'I' have NaN value
selected_rows = df[df['H'].isnull() & df['I'].isnull()]

It returns a dataframe containing only those rows where both the columns ‘H’ & ‘I’ contain the NaN values.

How did it work?

Although it is single line code and quite similar to first solution, but it is little hard to understand. So, let’s break this code into simple steps. It will help us clear some more concepts.

Steps to select only those rows from a dataframe, where a specific columns contains the NaN values are as follows,

  • Step 1: Select the dataframe column ‘H’ as a Series using the [] operator i.e. df[‘H’].
  • Step 2: Then Call the isnull() function of Series object like df[‘H’].isnull(). It returns a same sized bool series containing True or False. The True value in bool Series indicates that corresponding value in df[‘H’] is NaN, whereas False indicates the non NaN value.
  • Step 3: Select the dataframe column ‘I’ as a Series using the [] operator i.e. df[‘I’].
  • Step 4 Then Call the isnull() function of Series object like df[‘I’].isnull(). It returns a same sized bool series containing True or False. The True value in bool Series indicates that corresponding value in df[‘I’] is NaN, whereas False indicates the non NaN value.
  • Step 5: Apply the AND operator on both the bool series created in step 2 and step 4 i.e. (df[‘H’].isnull() & df[‘I’].isnull() ). It returns a final bool series. True value in this final bool Series indicates that the corresponding values in column ‘H’ and ‘I’ are NaN, where as False indicates the non-NaN value in any one of the two columns for that row.
  • Pass this final bool series to [] operator of dataframe like df[df[‘H’].isnull() & df[‘I’].isnull()]. It will return only those rows where the values in both the columns ‘H’ and ‘I’ are NaN.

Let’s see a complete example,

import pandas as pd
import numpy as np

# List of Tuples
empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5,  np.NaN, np.NaN, np.NaN),
            ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7,  np.NaN, np.NaN, np.NaN),
            ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN),
            ('Smriti', 12 , 16, 'London', 10, 11, 9, np.NaN, 11),
            ('Shaun', 23 , 18, 'London', 11, 12, 13, 13, 14),
            ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN),
            ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees,
                    columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'])

print("Contents of the Dataframe : ")
print(df)

# Select rows where column 'H' & 'I' have NaN value
selected_rows = df[df['H'].isnull() & df['I'].isnull()]

print('Selected rows')
print(selected_rows)

Output:

      A   B   C       D   E  F   G   H   I
0  Jack NaN  34  Sydney NaN  5 NaN NaN NaN
1  Riti NaN  31   Delhi NaN  7 NaN NaN NaN

Summary:

We learned about the different ways to select rows from a dataframe where a single or multiple columns contain NaN or missing values.

Advertisements

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top