In this article, we will discuss different ways to select the dataframe which do not contain any NaN value either in a specified column or in any column

Select dataframe rows without NaN value in a column

Suppose we have a dataframe like this,

     Name   Age    City   Exp
0    Jack  34.0  Sydney   5.0
1    Riti  31.0   Delhi   7.0
2    Aadi   NaN  London  11.0
3  Smriti   NaN  London   NaN
4   Shaun   NaN   Tokyo   4.0
5    Aadi  16.0     NaN  11.0
6    Mark  41.0   Delhi   NaN

We want to select only those dataframe rows, where column ‘Age’ do not has the NaN value i.e.

   Name   Age    City   Exp
0  Jack  34.0  Sydney   5.0
1  Riti  31.0   Delhi   7.0
5  Aadi  16.0     NaN  11.0
6  Mark  41.0   Delhi   NaN

To do that we need only a single line of code i.e.

# Select rows which do not have NaN value in column 'Age'
selected_rows = df[~df['Age'].isnull()]

It will return a dataframe containing only those rows where column ‘Age’ do not have the NaN value.

How did it work?

Although it is one line code but it is a little tricky one. So, let’s break this code into smaller steps. It will help us understand what is actually happening here.

Steps to select only those rows from a dataframe, where a given column do not have the NaN value:

  • Step 1: Select the dataframe column ‘Age’ as a Series using the [] operator i.e. df[‘Age’].
  • Step 2 Then Call the isnull() function of Series object like df[‘Age’].isnull(). It returns a same sized bool series containing True or False. A True value in bool Series indicates that corresponding value in df[‘Age’] is NaN, whereas False indicates the non NaN value.
  • Step 3: Apply the negate operator on bool series i.e. ~df[‘Age’].isnull() . It will invert the bool series. Now the True value in bool Series indicates that corresponding value in df[‘Age’] is non-NaN, whereas False indicates the value is a NaN value.
  • Step 4: Pass this inverted bool series to [] operator of dataframe like df[~df[‘Age’].isnull()]. It will return only those rows from dataframe where the corresponding value in bool series is True i.e. only those dataframe rows which do not have NaN value in column ‘Age’.

Let’s see a complete example,

import pandas as pd
import numpy as np

# List of Tuples
empoyees = [('Jack',   34,      'Sydney',   5),
            ('Riti',   31,      'Delhi' ,   7),
            ('Aadi',   np.NaN,  'London',   11),
            ('Smriti', np.NaN,  'London',   np.NaN),
            ('Shaun',  np.NaN,  'Tokyo',    4),
            ('Aadi',   16,       np.NaN,    11),
            ('Mark',   41,      'Delhi' ,   np.NaN)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees,
                    columns=['Name', 'Age', 'City', 'Exp'])

print("Contents of the Dataframe : ")
print(df)

# Select rows which do not have NaN value in column 'Age'
selected_rows = df[~df['Age'].isnull()]

print('Selected rows'

Output:

Contents of the Dataframe : 
     Name   Age    City   Exp
0    Jack  34.0  Sydney   5.0
1    Riti  31.0   Delhi   7.0
2    Aadi   NaN  London  11.0
3  Smriti   NaN  London   NaN
4   Shaun   NaN   Tokyo   4.0
5    Aadi  16.0     NaN  11.0
6    Mark  41.0   Delhi   NaN

Selected rows
   Name   Age    City   Exp
0  Jack  34.0  Sydney   5.0
1  Riti  31.0   Delhi   7.0
5  Aadi  16.0     NaN  11.0
6  Mark  41.0   Delhi   NaN

Select dataframe rows without NaN in a specified column using isna()

In pandas isna() function of Series is an alias of isnull(). So, you can use this also to select the rows with NaN in a specified column i.e.

# Select rows which do not have NaN value in column 'Age'
selected_rows = df[~df['Age'].isna()]

print('Selected rows')
print(selected_rows)

Output:

Selected rows
   Name   Age    City   Exp
0  Jack  34.0  Sydney   5.0
1  Riti  31.0   Delhi   7.0
5  Aadi  16.0     NaN  11.0
6  Mark  41.0   Delhi   NaN

Select dataframe rows without any NaN value

Suppose we have a dataframe like this,

     Name   Age    City   Exp
0    Jack  34.0  Sydney   5.0
1    Riti  31.0   Delhi   7.0
2    Aadi   NaN  London  11.0
3  Smriti   NaN  London   NaN
4   Shaun   NaN   Tokyo   4.0
5    Aadi  16.0     NaN  11.0
6    Mark  41.0   Delhi   NaN


We want to select only those rows from this dataframe which do not contain any NaN value in any of the column

   Name   Age    City  Exp
0  Jack  34.0  Sydney  5.0
1  Riti  31.0   Delhi  7.0

This can be done in a single line of code i.e.

# Select rows which do not contain  any NaN value in any of the column
selected_rows = df[~df.isnull().any(axis=1)]

It returns a dataframe containing only those rows which do not have any NaN value.

Steps to select only those dataframe rows, which do not have any NaN values in any column:

  • Step 1: Use the dataframe.isnull() function like df.isnull(). It returns a same sized bool dataframe, which contains only True and False values. Where, each True value indicates that there is a NaN at the corresponding position in the calling dataframe object and False indicates a non-NaN value.
  • Step 2: Then call the any(axis=1) function on the bool datframe like, df.isnull().any(axis=1). The any() function looks for any True values along the given axis. If axis==1, then it will look along the columns for each row. It means, for each row it will check all the column values and reduce it to a single value. For a row, if any of the column contains the NaN values, then the reduced value for that row will be True. It returns a bool Series, where each value represents a row of the dataframe. If a value in this Series is True, then it indicates that corresponding row has one or more NaN values.
  • Step 3: Apply the negate operator on bool series i.e. ~df.isnull().any(axis=1) . It will invert the bool series. Now the True value in bool Series indicates that corresponding row in dataframe has no NaN value.
  • Step 4: Then pass this inverted bool Series to the [] operator of the dataframe i.e. df[~df.isnull().any(axis=1)]. It returns only those rows from dataframe, which do not contain any NaN value

Let’s see a complete example,

import pandas as pd
import numpy as np

# List of Tuples
empoyees = [('Jack',   34,      'Sydney',   5),
            ('Riti',   31,      'Delhi' ,   7),
            ('Aadi',   np.NaN,  'London',   11),
            ('Smriti', np.NaN,  'London',   np.NaN),
            ('Shaun',  np.NaN,  'Tokyo',    4),
            ('Aadi',   16,       np.NaN,    11),
            ('Mark',   41,      'Delhi' ,   np.NaN)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees,
                    columns=['Name', 'Age', 'City', 'Exp'])

print("Contents of the Dataframe : ")
print(df)

# Select rows which do not contain  any NaN value in any of the column
selected_rows = df[~df.isnull().any(axis=1)]

print('Selected rows')
print(selected_rows)

Output:

Contents of the Dataframe :
     Name   Age    City   Exp
0    Jack  34.0  Sydney   5.0
1    Riti  31.0   Delhi   7.0
2    Aadi   NaN  London  11.0
3  Smriti   NaN  London   NaN
4   Shaun   NaN   Tokyo   4.0
5    Aadi  16.0     NaN  11.0
6    Mark  41.0   Delhi   NaN

Selected rows
   Name   Age    City  Exp
0  Jack  34.0  Sydney  5.0
1  Riti  31.0   Delhi  7.0

Summary:

We learned how to select only those dataframe rows, which do not have any NaN value, either in a specified column or in any column.