In this article, we will discuss how to select dataframe rows which contains all NaN values.
Suppose we have a dataframe like this,
A B C D E F G H I 0 Jack NaN 34.0 Sydney NaN 5.0 NaN NaN NaN 1 Riti NaN 31.0 Delhi NaN 7.0 NaN NaN NaN 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 Smriti 12.0 16.0 London 10.0 11.0 9.0 3.0 11.0 4 Atharv 23.0 18.0 London 11.0 12.0 13.0 13.0 14.0 5 NaN NaN NaN NaN NaN NaN NaN NaN NaN 6 Avisha NaN 16.0 London NaN 11.0 NaN 3.0 NaN 7 NaN NaN NaN NaN NaN NaN NaN NaN NaN
From this dataframe, we want to select only those rows which contain only NaN values. Like this,
A B C D E F G H I 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 NaN NaN NaN NaN NaN NaN NaN NaN NaN
In pandas, using the isnull() and all() functions of dataframe, we can do this in a single line i.e.
# Select rows which contain only NaN values selected_rows = df[df.isnull().all(axis=1)]
It will return a dataframe containing only those rows, which contain all NaN values.
How did it work?
Although it is one line solution, but it is little hard to understand. So, let’s simplify this code into simple steps. It will help us understand what exactly is happening behind the scene.
Steps to select only those dataframe rows, which contain only NaN values:
- Step 1: Use the dataframe’s isnull() function like df.isnull(). It will return a same sized bool dataframe, which contains only True and False values. Where, each True value indicates that there is a NaN at the corresponding position in the calling dataframe object and False indicates a non-NaN value.
- Step 2: Then call the all(axis=1) function on the bool datframe like, df.isnull().all(axis=1). The all() function looks for all True values along the given axis. If axis==1, then it will look along the columns for each row. It means, for each row it will check all the column values and reduce it to a single value. For a row, if all columns contains the NaN values, then the reduced value for that row will be True. It returns a bool Series, where each value represents a row of the dataframe. If a value in this Series is True, then it indicates that all the values in the corresponding row are NaN values.
- Step 3: Then pass this bool Series to the [] operator of the dataframe i.e. df[df.isnull().all(axis=1)]. It returns only those rows from dataframe, where corresponding value in bool Series is True. It means it returns only those rows which has only NaN values.
Let’s see a complete example,
import pandas as pd import numpy as np # List of Tuples empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5, np.NaN, np.NaN, np.NaN), ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7, np.NaN, np.NaN, np.NaN), ( np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN), ('Smriti', 12 , 16, 'London', 10, 11, 9, 3, 11), ('Atharv', 23 , 18, 'London', 11, 12, 13, 13, 14), ( np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN), ('Avisha', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ( np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']) print("Contents of the Dataframe : ") print(df) # Select rows which contain only NaN values selected_rows = df[df.isnull().all(axis=1)] print('Selected rows') print(selected_rows)
Output:
Contents of the Dataframe : A B C D E F G H I 0 Jack NaN 34.0 Sydney NaN 5.0 NaN NaN NaN 1 Riti NaN 31.0 Delhi NaN 7.0 NaN NaN NaN 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 Smriti 12.0 16.0 London 10.0 11.0 9.0 3.0 11.0 4 Atharv 23.0 18.0 London 11.0 12.0 13.0 13.0 14.0 5 NaN NaN NaN NaN NaN NaN NaN NaN NaN 6 Avisha NaN 16.0 London NaN 11.0 NaN 3.0 NaN 7 NaN NaN NaN NaN NaN NaN NaN NaN NaN Selected rows A B C D E F G H I 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Here we selected only those dataframe rows which contain all NaN values.
Select rows with only NaN values using isna() and all()
We can achieve same things using isna() function of dataframe. It is an alias of isnull(), so we can use the same logic i.e.
# Select rows which contain only NaN values selected_rows = df[df.isna().all(axis=1)] print('Selected rows') print(selected_rows)
Output:
Selected rows A B C D E F G H I 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 NaN NaN NaN NaN NaN NaN NaN NaN NaN
It selected only those dataframe rows which contain only NaN values.
Summary:
We learned different ways to select only those rows from a dataframe which contains all NaN values.
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.