In this article, we will discuss different ways to find rows with NaN in columns of a Pandas Dataframe.
Pandas – Find Rows with NaN in a specified column
Suppose we have a dataframe like this,
A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Smriti 12.0 16 London 10.0 11 9.0 NaN 11.0 4 Shaun 23.0 18 London 11.0 12 13.0 13.0 14.0 5 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 6 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0
We want to select only those rows from this dataframe, where column ‘H’ has NaN value i.e.
A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 3 Smriti 12.0 16 London 10.0 11 9.0 NaN 11.0
To do that we need only a single line of code i.e.
selected_rows = df[df['H'].isnull()]
It will return a dataframe containing only those rows where column ‘H’ contains the NaN values.
Frequently Asked:
How did it work?
Although it is one line code but it can be little tricky to understand. So, let’s break this code into simple steps. It will help us understand what is actually happening behind the scenes.
Steps to select only those rows from a dataframe, where a given column contains the NaN values are as follows,
- Step 1: Select the dataframe column ‘H’ as a Series using the [] operator i.e. df[‘H’].
- Step 2: Then Call the isnull() function of Series object like df[‘H’].isnull(). It returns a same sized bool series containing True or False. A True value in bool Series indicates that corresponding value in df[‘H’] is NaN, whereas False indicates the non NaN value.
- Step 3: Pass this bool series to [] operator of dataframe like df[df[‘H’].isnull()]. It will return only those rows from dataframe where the corresponding value in bool series is True.
Let’s see a complete example,
Latest Python - Video Tutorial
import pandas as pd import numpy as np # List of Tuples empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5, np.NaN, np.NaN, np.NaN), ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7, np.NaN, np.NaN, np.NaN), ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ('Smriti', 12 , 16, 'London', 10, 11, 9, np.NaN, 11), ('Shaun', 23 , 18, 'London', 11, 12, 13, 13, 14), ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']) print("Contents of the Dataframe : ") print(df) # Select rows where column 'H' has NaN value selected_rows = df[df['H'].isnull()] print('Selected rows') print(selected_rows)
Output:
Contents of the Dataframe : A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Smriti 12.0 16 London 10.0 11 9.0 NaN 11.0 4 Shaun 23.0 18 London 11.0 12 13.0 13.0 14.0 5 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 6 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0 Selected rows A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 3 Smriti 12.0 16 London 10.0 11 9.0 NaN 11.0
Select dataframe rows with NaN in a specified column using isna()
In pandas isna() function of Series is an alias of isnull(). So, you can use this also to select the rows with NaN in a specified column i.e.
# Select rows where column 'H' has NaN value selected_rows = df[df['H'].isna()] print('Selected rows') print(selected_rows)
It is exactly similar to previous solution becuase isna() is an alias of isnull().
Related Posts:
Select Dataframe Rows with NaN in multiple columns
Suppose we have a dataframe like this,
A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Smriti 12.0 16 London 10.0 11 9.0 NaN 11.0 4 Shaun 23.0 18 London 11.0 12 13.0 13.0 14.0 5 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 6 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0
We want to select only those rows from this dataframe, where columns ‘H’ and ‘I’ have NaN values i.e.
A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN
This can be done in a single line of code i.e.
# Select rows where column 'H' & 'I' have NaN value selected_rows = df[df['H'].isnull() & df['I'].isnull()]
It returns a dataframe containing only those rows where both the columns ‘H’ & ‘I’ contain the NaN values.
How did it work?
Although it is single line code and quite similar to first solution, but it is little hard to understand. So, let’s break this code into simple steps. It will help us clear some more concepts.
Steps to select only those rows from a dataframe, where a specific columns contains the NaN values are as follows,
- Step 1: Select the dataframe column ‘H’ as a Series using the [] operator i.e. df[‘H’].
- Step 2: Then Call the isnull() function of Series object like df[‘H’].isnull(). It returns a same sized bool series containing True or False. The True value in bool Series indicates that corresponding value in df[‘H’] is NaN, whereas False indicates the non NaN value.
- Step 3: Select the dataframe column ‘I’ as a Series using the [] operator i.e. df[‘I’].
- Step 4 Then Call the isnull() function of Series object like df[‘I’].isnull(). It returns a same sized bool series containing True or False. The True value in bool Series indicates that corresponding value in df[‘I’] is NaN, whereas False indicates the non NaN value.
- Step 5: Apply the AND operator on both the bool series created in step 2 and step 4 i.e. (df[‘H’].isnull() & df[‘I’].isnull() ). It returns a final bool series. True value in this final bool Series indicates that the corresponding values in column ‘H’ and ‘I’ are NaN, where as False indicates the non-NaN value in any one of the two columns for that row.
- Pass this final bool series to [] operator of dataframe like df[df[‘H’].isnull() & df[‘I’].isnull()]. It will return only those rows where the values in both the columns ‘H’ and ‘I’ are NaN.
Let’s see a complete example,
import pandas as pd import numpy as np # List of Tuples empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5, np.NaN, np.NaN, np.NaN), ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7, np.NaN, np.NaN, np.NaN), ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ('Smriti', 12 , 16, 'London', 10, 11, 9, np.NaN, 11), ('Shaun', 23 , 18, 'London', 11, 12, 13, 13, 14), ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']) print("Contents of the Dataframe : ") print(df) # Select rows where column 'H' & 'I' have NaN value selected_rows = df[df['H'].isnull() & df['I'].isnull()] print('Selected rows') print(selected_rows)
Output:
A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN
Summary:
We learned about the different ways to select rows from a dataframe where a single or multiple columns contain NaN or missing values.
Latest Video Tutorials