In this article, we will discuss how to delete the columns of a dataframe which contain atleast a NaN value. We can also say that, we are going to delete those dataframe columns which contain one or more missing values.
Table of Contents
We are going to use the pandas dropna() function. So, first let’s have a little overview of it,
Overview of dataframe.dropna()function
Pandas provide a function to delete rows or columns from a dataframe based on NaN values it contains.
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Arguments:
- axis: Default – 0
- 0, or ‘index’ : Drop rows which contain NaN values.
- 1, or ‘columns’ : Drop columns which contain NaN value.
- how: Default – ‘any’
- ‘any’ : Drop rows / columns which contain any NaN values.
- ‘all’ : Drop rows / columns which contain all NaN values.
- thresh (int): Optional
- Delete rows/columns which contains less than minimun thresh number of non-NaN values.
- inplace (bool): Default- False
- If True, modifies the calling dataframe object
Returns
- If inplace==True, the return None, else returns a new dataframe by deleting the rows/columns based on NaN values.
Let’s use this to perform our task of deleting columns with all NaN values.
Frequently Asked:
- Convert Column Values to Uppercase in Pandas Dataframe
- Pandas: Select rows with NaN in any column
- Count Unique Values in all Columns of Pandas Dataframe
- Replace empty strings in a pandas DataFrame with NaN
Pandas: Delete dataframe columns containing any NaN value
Suppose we have a dataframe that contains few columns which has one or more than one NaN values,
A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0
Now we want to delete those dataframe columns which contain any NaN values (column ‘B’, ‘E’, ‘G’, ‘H’ and ‘I’). So, new dataframe should be like this,
A C D F 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12
For this we can use a pandas dropna() function. It can delete the columns or rows of a dataframe that contains all or few NaN values. As we want to delete the columns that contains at least a NaN value, so we will pass following arguments in it,
# Drop columns which contain one or more NaN values df = df.dropna(axis=1, how='any')
- axis=1 : Drop columns which contain missing value.
- how=’any’ : If any value is NaN, then drop those columns (because axis==1).
It returned a dataframe after deleting the columns with one or more NaN values and then we assigned that dataframe to the same variable.
Checkout complete example as follows,
import pandas as pd import numpy as np # List of Tuples empoyees = [('Jack', np.NaN, 34, 'Sydney', np.NaN, 5, np.NaN, np.NaN, np.NaN), ('Riti', np.NaN, 31, 'Delhi' , np.NaN, 7, np.NaN, np.NaN, np.NaN), ('Aadi', np.NaN, 16, 'London', np.NaN, 11, np.NaN, 3, np.NaN), ('Mark', np.NaN, 41, 'Delhi' , np.NaN, 12, np.NaN, 11, 1)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']) print("Contents of the Dataframe : ") print(df) # Drop columns containing any NaN value df = df.dropna(axis=1, how='any') print("Modified Dataframe : ") print(df)
Output:
A B C D E F G H I 0 Jack NaN 34 Sydney NaN 5 NaN NaN NaN 1 Riti NaN 31 Delhi NaN 7 NaN NaN NaN 2 Aadi NaN 16 London NaN 11 NaN 3.0 NaN 3 Mark NaN 41 Delhi NaN 12 NaN 11.0 1.0 Modified Dataframe : A C D F 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12
It deleted columns ‘B’, ‘E’, ‘G’, ‘H’ and ‘I’ of dataframe, because they had at least a NaN value.