In this article, we will discuss how to select dataframe rows which contains atleast one NaN value.
Suppose we have a dataframe like this,
A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Smriti 12.0 16 London 10.0 11 9.0 3.0 11.0 4 Shaun 23.0 18 London 11.0 12 13.0 13.0 14.0 5 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 6 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0
From this dataframe, we want to select only those rows which contain one or more NaN values. Like this,
A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 5 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 6 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0
In pandas, using the isnull() and any() functions of the dataframe, we can do this in a single line i.e.
selected_rows = df[df.isnull().any(axis=1)]
It will return a dataframe containing only those rows, which contain any NaN value.
Frequently Asked:
How did it work?
Although it is one line solution, but it is little hard to understand. So, let’s simplify this code into simple steps. It will help us understand what exactly is happening behind the scene.
Steps to select only those dataframe rows, which contain any NaN value
- Step 1: Call the isnull() function on dataframe like df.isnull(). It will return a same sized bool dataframe containing only True or False values. True value indicates that there is a NaN at the corresponding position in the original dataframe and False indicates a Non-NaN value.
- Step 2: Then call the any(axis=1) function on the bool dataframe like, df.isnull().any(axis=1). The any() function looks for any True value along the given axis. If axis==1, then it will look along the columns for each row. It means, for each row it will check all the column values and reduce it to a single value. For a row, if any column contains the NaN, then the reduced value for that row will be True. Therefore, it returns a bool Series, where each value represents a row of dataframe. If value is True, then it indicates that there is one or more NaN values in that row.
- Step 3: Then pass this bool Series to [] operator of dataframe i.e. df[df.isnull().any(axis=1)]. It returns only those rows where bool Series has True value. It means it returns only those rows which contain any NaN value.
Let’s see a complete example,
Latest Python - Video Tutorial
import pandas as pd import numpy as np # List of Tuples empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5, np.NaN, np.NaN, np.NaN), ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7, np.NaN, np.NaN, np.NaN), ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ('Smriti', 12 , 16, 'London', 10, 11, 9, 3, 11), ('Shaun', 23 , 18, 'London', 11, 12, 13, 13, 14), ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']) print("Contents of the Dataframe : ") print(df) # Select rows with atleast a NaN value selected_rows = df[df.isnull().any(axis=1)] print('Selected rows') print(selected_rows)
Output:
Contents of the Dataframe : A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Smriti 12.0 16 London 10.0 11 9.0 3.0 11.0 4 Shaun 23.0 18 London 11.0 12 13.0 13.0 14.0 5 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 6 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0 Selected rows A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 5 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 6 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0
Here we selected only those dataframe rows which contain one or more NaN / missing value.
Using isna() instead of isnull()
We can achieve same things using isna() function of the dataframe. It is an alias of isnull(), so we can use the same logic i.e.
# Select rows with atleast a NaN value selected_rows = df[df.isna().any(axis=1)] print('Selected rows') print(selected_rows)
Output:
Selected rows A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 5 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 6 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0
Summary:
We learned different ways to select only those rows from a dataframe which contains any NaN value in it.
Latest Video Tutorials