In this article, we will discuss how to delete the columns of a dataframe which contain all NaN values.
Table of Contents
We are going to use the pandas dropna() function. So, first let’s have a little overview of it,
Overview of dataframe.dropna()function
Pandas provide a function to delete rows or columns from a dataframe based on NaN values it contains.
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Arguments:
- axis: Default – 0
- 0, or ‘index’ : Drop rows which contain NaN values.
- 1, or ‘columns’ : Drop columns which contain NaN value.
- how: Default – ‘any’
- ‘any’ : Drop rows / columns which contain any NaN values.
- ‘all’ : Drop rows / columns which contain all NaN values.
- thresh (int): Optional
- Delete rows/columns which contains less than minimun thresh number of non-NaN values.
- inplace (bool): Default- False
- If True, modifies the calling dataframe object
Returns
- If inplace==True, the return None, else returns a new dataframe by deleting the rows/columns based on NaN values.
Let’s use this to perform our task of deleting columns with all NaN values.
Pandas: Delete columns of dataframe if all NaN values
Suppose we have a dataframe that contains few columns with all NaN values,
A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0
Now we want to delete those columns from this dataframe which contains all NaN values (column ‘E’ and ‘G’). So, new dataframe should be like this,
A C D F H I 0 Jack 34 Sydney 5 NaN NaN 1 Riti 31 Delhi 7 NaN NaN 2 Aadi 16 London 11 3.0 NaN 3 Mark 41 Delhi 12 11.0 1.0
For this we can use a pandas dropna() function. It can delete the columns or rows of a dataframe that contains all or few NaN values. As we want to delete the columns that contains all NaN values, so we will pass following arguments in it,
# Drop columns which contain all NaN values df = df.dropna(axis=1, how='all')
- axis=1 : Drop columns which contain missing value.
- how=’all’ : If all values are NaN, then drop those columns (because axis==1).
It returned a dataframe after deleting the columns with all NaN values and then we assigned that dataframe to the same variable.
Checkout complete example as follows,
import pandas as pd import numpy as np # List of Tuples empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5, np.NaN, np.NaN, np.NaN), ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7, np.NaN, np.NaN, np.NaN), ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']) print("Contents of the Dataframe : ") print(df) # Drop columns which contain all NaN values df = df.dropna(axis=1, how='all') print("Modified Dataframe : ") print(df)
Output:
Contents of the Dataframe : A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0 Modified Dataframe : A C D F H I 0 Jack 34 Sydney 5 NaN NaN 1 Riti 31 Delhi 7 NaN NaN 2 Aadi 16 London 11 3.0 NaN 3 Mark 41 Delhi 12 11.0 1.0
It deleted columns ‘E’ and ‘G’ of dataframe, because they had only NaN values.
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.