In this article, we will discuss how to select dataframe columns which contains the NaN values (any, all or None).
Table of contents:
 Select dataframe columns with any NaN values.
 Select dataframe columns with all NaN values.
 Select dataframe columns without a NaN value.
Let’s first create a dataframe and then we will see how to select columns from it based on the NaN values,
import pandas as pd import numpy as np # List of Tuples empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5, np.NaN, np.NaN, np.NaN), ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7, np.NaN, np.NaN, np.NaN), ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']) print("Contents of the Dataframe : ") print(df)
Output:
Contents of the Dataframe : A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0
Now let’s see how to select columns from the above dataframe containing NaN values,
Pandas: Find Dataframe columns with any NaN value
To select the columns with any NaN value, use the loc[] attribute of the dataframe i.e.
loc[row_section, column_section]
 row_section: In the row_section pass ‘:’ to include all rows.
 column_section: In the column section pass a bool series, which should be of same size as number of columns of the dataframe. If any value in bool series is True then it means that column contains any NaN value.
For example,
# Select dataframe columns with any NaN value subset_df = df.loc[:, df.isnull().any()] print(subset_df)
Output
B E G H I 0 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN 2 NaN NaN NaN 3.0 NaN 3 NaN NaN NaN 11.0 1.0
It returned a dataframe with only those columns from the original dataframe, which contains any NaN value.
This oneliner solution seems a little complex. So, let’s break this down a little to understand how it is works.
Why did we passed df.isnull().any() in the column section of loc[]?
df.isnull().any() returns a bool series and each value in the bool series represents a column. If any value in bool series is True then it means that corresponding column has any NaN value in it. By passing this in the column_section of loc[], we made sure that loc[] selects only the columns containing any NaN value.
How df.isnull().any() works?
First of all, call isNull() function on the dataframe object, it returns a same sized dataframe containing only bool values. In this bool dataframe, a cell contains True if the corresponding cell in the original dataframe is NaN, otherwise it contains False. Then call any() function on this Boolean dataframe object. It looks over the column axis and returns a bool series. Each value in the bool series represents a column and if value is True then it means that column has any NaN value.
Then pass that bool series to the column section of loc[], it selects only those dataframe columns which has any NaN value.
Pandas: Find Dataframe columns with all NaN values.
To select the columns with all NaN value, use the loc[] attribute of the dataframe i.e.
loc[row_section, column_section]
 row_section: In the row_section pass ‘:’ to include all rows.
 column_section: In the column section pass a bool series, which should be of same size as number of columns of the dataframe. If all value in bool series is True then it means, that respective column contains all NaN values.
For example,
# Select dataframe columns with all NaN values subset_df = df.loc[:, df.isnull().all()] print(subset_df)
Output
B E G 0 NaN NaN NaN 1 NaN NaN NaN 2 NaN NaN NaN 3 NaN NaN NaN
It returned a dataframe with only those columns from the original dataframe, which contains only NaN values.
This oneliner solution seems a little complex. So, let’s break this down a little to understand how it is works.
Why did we passed df.isnull().all() in the column section of loc[]?
df.isnull().all() returns a bool series and each value in the bool series represents a column. If all value in bool series is True then it means that corresponding column has all NaN value in it. By passing this in the column_section of loc[], we made sure that loc[] selects only the columns containing only NaN values.
How df.isnull().all() works?
First of all, call isNull() function on the dataframe object, it returns a same sized dataframe containing only bool values. In this bool dataframe, a cell contains True if the corresponding cell in the original dataframe is NaN, otherwise it contains False. Then call all() function on this Boolean dataframe object. It looks over the column axis and returns a bool series. Each value in the bool series represents a column and if value is True then it means that column has only NaN values in it.
Then pass that bool series to the column section of loc[], it selects only those dataframe columns which has all NaN values.
Pandas – Select Columns Without NaN
To select only those columns from dataframe which do not contain any NaN value, use the loc[] attribute of the dataframe i.e.
loc[row_section, column_section]
 row_section: In the row_section pass ‘:’ to include all rows.
 column_section: In the column section pass a bool series, which should be of same size as number of columns of the dataframe. If all value in bool series is True then it means that column has only nonNaN values.
For example,
# Select columns without a single NaN value subset_df = df.loc[:, ~df.isnull().any()] print(subset_df)
Output
A C D F 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12
It returned a dataframe with only those columns from the original dataframe, which contains all nonNaN values.
This oneliner solution seems a little complex. So, let’s break this down a little to understand how it is works.
Why did we passed ~df.isnull().any() in the column section of loc[]?
~df.isnull().any() returns a bool series and each value in the bool series represents a column. If any value in bool series is True then it means that corresponding column has no NaN value in it. By passing this in the column_section of loc[], we made sure that loc[] selects only the columns containing all non NaN values.
How ~df.isnull().any() works?
First of all, call isNull() function on the dataframe object, it returns a same sized dataframe containing only bool values. In this bool dataframe, a cell contains True if the corresponding cell in the original dataframe is NaN, otherwise it contains False. Then call any() function on this Boolean dataframe object. It looks over the column axis and returns a bool series. Each value in the bool series represents a column and if value is True then it means that column has atleast NaN value. Then using the ~ operator, we inverted the bool series and if any value is True in series, then it means that column has no NaN value in it.
Then pass that bool series to the column section of loc[], it selects only those dataframe columns which has all non NaN values.
The complete example is as follows,
import pandas as pd import numpy as np # List of Tuples empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5, np.NaN, np.NaN, np.NaN), ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7, np.NaN, np.NaN, np.NaN), ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']) print("Contents of the Dataframe : ") print(df) # Select dataframe columns with any NaN value subset_df = df.loc[:, df.isnull().any()] print(subset_df) # Select dataframe columns with all NaN values subset_df = df.loc[:, df.isnull().all()] print(subset_df) # Select columns without a single NaN value subset_df = df.loc[:, ~df.isnull().any()] print(subset_df)
Output:
Contents of the Dataframe : A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0 B E G H I 0 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN 2 NaN NaN NaN 3.0 NaN 3 NaN NaN NaN 11.0 1.0 B E G 0 NaN NaN NaN 1 NaN NaN NaN 2 NaN NaN NaN 3 NaN NaN NaN A C D F 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12
Pandas Tutorials Learn Data Analysis with Python

Pandas Tutorial Part #1  Introduction to Data Analysis with Python

Pandas Tutorial Part #2  Basics of Pandas Series

Pandas Tutorial Part #3  Get & Set Series values

Pandas Tutorial Part #4  Attributes & methods of Pandas Series

Pandas Tutorial Part #5  Add or Remove Pandas Series elements

Pandas Tutorial Part #6  Introduction to DataFrame

Pandas Tutorial Part #7  DataFrame.loc[]  Select Rows / Columns by Indexing

Pandas Tutorial Part #8  DataFrame.iloc[]  Select Rows / Columns by Label Names

Pandas Tutorial Part #9  Filter DataFrame Rows

Pandas Tutorial Part #10  Add/Remove DataFrame Rows & Columns

Pandas Tutorial Part #11  DataFrame attributes & methods

Pandas Tutorial Part #12  Handling Missing Data or NaN values

Pandas Tutorial Part #13  Iterate over Rows & Columns of DataFrame

Pandas Tutorial Part #14  Sorting DataFrame by Rows or Columns

Pandas Tutorial Part #15  Merging or Concatenating DataFrames

Pandas Tutorial Part #16  DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most soughtafter professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.