Select Rows where a column is null in Pandas

This tutorial will discuss about different ways to select DataFrame rows where a column is null in pandas.

Table Of Contents

Preparing DataSet

Let’s create a DataFrame with some hardcoded data.

import pandas as pd
import numpy as np


data = {'Col_A': [33, 12, 33, 14, 35, 36, 17],
        'Col_B': [21, 22, 23, 24, 25, 26, 27],
        'Col_C': [33, np.NaN, 33, np.NaN, 35, None, 35]}

index=["X1", "X2", "X3", "X4", "X5", "X6", "X7"]

# Create a DataFrame from a dictionary
df = pd.DataFrame.from_dict(data)

# Set the index list as Index of DataFrame
df.set_index(pd.Index(index), inplace=True)

print(df)

Output

    Col_A  Col_B  Col_C
X1     33     21   33.0
X2     12     22    NaN
X3     33     23   33.0
X4     14     24    NaN
X5     35     25   35.0
X6     36     26    NaN
X7     17     27   35.0

This column has certain NaN values in column ‘Col_C’. Now, we will operate on this DataFrame, and see how to select DataFrame rows where a column is null or NaN in Pandas.

Select DataFrame Rows where a column has Nan or None value

We are going to use the loc[] attribute of DataFrame, to select select only those rows from a DataFrame, where a specified column contains either NaN or None values.

For that, we will select that particular column as a Series object and then we will call the isin() method on that particular column. We will pass a list containing NaN and None values, in the isin() method. It will return as a boolean series, where each True value represents that that corresponding column value is either None or NaN.

Then we will pass this boolean series in the loc[] attribute of the DataFrame, and it will return us a DataFrame containing only those rows for which there is true in the given boolean series. It means only those rows which has None or NaN value in the specified column.

Like in the below example, we are going to select only those rows from a DataFrame where column Col_C has either NaN or None values.

# Select rows where column "Col_C" has either NaN or None value
subDf = df.loc[df['Col_C'].isin([np.NaN, None])]

print (subDf)

Output

    Col_A  Col_B  Col_C
X2     12     22    NaN
X4     14     24    NaN
X6     36     26    NaN

Summary

We learned about a way to select only those rows from a DataFrame which contains either NaN or None in a specified column. Thanks.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top