In this article, we will discuss how to delete the rows of a dataframe based on NaN percentage, it means by the percentage of missing values the rows contains.

For example, deleting dataframe rows where NaN value are either 25% or more than 25%. Similarly we will build a solution to drop rows which contain more than N% of NaN / missing values.

**Table of Contents**

We are going to use the pandas dropna() function. So, first let’s have a little overview of it,

## Overview of dataframe.dropna()function

Pandas provide a function to delete rows or columns from a dataframe based on NaN values it contains.

**Advertisements**

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

**Arguments:**

**axis**: Default – 0- 0, or â€˜indexâ€™ : Drop rows which contain NaN values.
- 1, or â€˜columnsâ€™ : Drop columns which contain NaN value.

**how**: Default – ‘any’- â€˜anyâ€™ : Drop rows / columns which contain any NaN values.
- â€˜allâ€™ : Drop rows / columns which contain all NaN values.

**thresh**(int): Optional- Delete rows/columns which contains less than minimun thresh number of non-NaN values.

**inplace**(bool): Default- False- If True, modifies the calling dataframe object

**Returns**

- If inplace==True, the return None, else returns a new dataframe by deleting the rows/columns based on NaN values.

Let’s use this to perform our task of deleting rows based on percentage of missing values.

## Pandas: Delete dataframe rows based on NaN percentage

Suppose we have a dataframe that contains few rows which has one or more NaN values,

0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 2 NaN NaN NaN NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN 6 Mark NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN

Percentage of NaN values in each row is as follows,

**Row ‘0’:**0% of NaN values.**Row**‘1: 25% of NaN values.**Row**‘2’: 100% of NaN values.**Row**‘3’: 0% of NaN values.**Row**‘4’: 50% of NaN values.**Row**‘5’: 50% of NaN values.**Row**‘6’: 75% of NaN values.**Row**‘7’: 100% of NaN values.**Row**‘8’: 100% of NaN values.

To delete rows based on percentage of NaN values in rows, we can use a pandas dropna() function. It can delete the columns or rows of a dataframe that contains all or few NaN values. As we want to delete the rows that contains either N% or more than N% of NaN values, so we will pass following arguments in it,

# Delete rows containing either 75% or more than 75% NaN Values perc = 75.0 # Here N is 75 min_count = int(((100-perc)/100)*df.shape[1] + 1) mod_df = df.dropna( axis=0, thresh=min_count)

**axis=0 :**Drop rows which contain missing value.**thresh=min_count**: Delete rows which contains less than**min_count**number of non-NaN values.- We calculated this min_count based on percentage of NaN values i.e. N%.

It returned a dataframe after deleting the rows containing either N% or more than N% of NaN values and then we assigned that dataframe to the same variable.

**Let’s checkout some examples,**

## Drop Dataframe rows containing either 75% or more than 75% NaN values

import pandas as pd import numpy as np # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , np.NaN) , (np.NaN, np.NaN, np.NaN , np.NaN), ('Aadi', 16, 'London', 11) , ('John', 31, np.NaN , np.NaN) , ('Kate', 31, np.NaN , np.NaN) , ('Mark', np.NaN, np.NaN , np.NaN), (np.NaN, np.NaN, np.NaN , np.NaN), (np.NaN, np.NaN, np.NaN , np.NaN)] # Create a DataFrame object df = pd.DataFrame( empoyees) print("Contents of the Dataframe : ") print(df) # Delete rows containing either 75% or more than 75% NaN Values perc = 75.0 min_count = int(((100-perc)/100)*df.shape[1] + 1) mod_df = df.dropna( axis=0, thresh=min_count) print("Modified Dataframe : ") print(mod_df)

**Output:**

0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 2 NaN NaN NaN NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN 6 Mark NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN Modified Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN

It deleted rows with index value ‘2’, ‘6’, ‘7’, ‘8’, because they had either 75% or more than 75% NaN values.

## Drop Dataframe rows containing either 90% or more than 90% NaN values

print("Contents of the Dataframe : ") print(df) # Delete rows containing either 90% or more than 90% NaN Values perc = 90.0 min_count = int(((100-perc)/100)*df.shape[1] + 1) mod_df = df.dropna( axis=0, thresh=min_count) print("Modified Dataframe : ") print(mod_df)

**Output:**

Contents of the Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 2 NaN NaN NaN NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN 6 Mark NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN Modified Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN 6 Mark NaN NaN NaN

It deleted rows with index value ‘2’, ‘7’ and ‘8’, because they had more than 90% NaN values.

## Drop Dataframe rows containing either 25% or more than 25% NaN values

print("Contents of the Dataframe : ") print(df) # Delete rows containing either 25% or more than 25% NaN Values perc = 25.0 min_count = int(((100-perc)/100)*df.shape[1] + 1) mod_df = df.dropna( axis=0, thresh=min_count) print("Modified Dataframe : ") print(mod_df)

**Output**

Contents of the Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 2 NaN NaN NaN NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN 6 Mark NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN Modified Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 3 Aadi 16.0 London 11.0

It deleted rows with index value ‘1’, ‘2’, ‘4’, ‘5’, ‘6’, ‘7’ and ‘8’, because they had more either 25% or more than 25% NaN values.

**The complete example is as follows,**

import pandas as pd import numpy as np # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , np.NaN) , (np.NaN, np.NaN, np.NaN , np.NaN), ('Aadi', 16, 'London', 11) , ('John', 31, np.NaN , np.NaN) , ('Kate', 31, np.NaN , np.NaN) , ('Mark', np.NaN, np.NaN , np.NaN), (np.NaN, np.NaN, np.NaN , np.NaN), (np.NaN, np.NaN, np.NaN , np.NaN)] # Create a DataFrame object df = pd.DataFrame( empoyees) print("Contents of the Dataframe : ") print(df) # Delete rows containing either 75% or more than 75% NaN Values perc = 75.0 min_count = int(((100-perc)/100)*df.shape[1] + 1) mod_df = df.dropna( axis=0, thresh=min_count) print("Modified Dataframe : ") print(mod_df) print("Contents of the Dataframe : ") print(df) # Delete rows containing either 90% or more than 90% NaN Values perc = 90.0 min_count = int(((100-perc)/100)*df.shape[1] + 1) mod_df = df.dropna( axis=0, thresh=min_count) print("Modified Dataframe : ") print(mod_df) print("Contents of the Dataframe : ") print(df) # Delete rows containing either 25% or more than 25% NaN Values perc = 25.0 min_count = int(((100-perc)/100)*df.shape[1] + 1) mod_df = df.dropna( axis=0, thresh=min_count) print("Modified Dataframe : ") print(mod_df)

**Output:**

Contents of the Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 2 NaN NaN NaN NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN 6 Mark NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN Modified Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN Contents of the Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 2 NaN NaN NaN NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN 6 Mark NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN Modified Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN 6 Mark NaN NaN NaN Contents of the Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 1 Riti 31.0 Delhi NaN 2 NaN NaN NaN NaN 3 Aadi 16.0 London 11.0 4 John 31.0 NaN NaN 5 Kate 31.0 NaN NaN 6 Mark NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN Modified Dataframe : 0 1 2 3 0 Jack 34.0 Sydney 5.0 3 Aadi 16.0 London 11.0

**Advertisements**

Thanks for reading.