Python Pandas : Count NaN or missing values in DataFrame ( also row & column wise)

In this article we will discuss how to find NaN or missing values in a Dataframe.

Manytimes we create a DataFrame from an exsisting dataset and it might contain some missing values in any column or row.  For every missing value Pandas add NaN at it’s place.

Let’s create a dataframe with missing values i.e.

# List of Tuples
    students = [ ('jack', np.NaN, 'Sydeny' , 'Australia') ,
                 ('Riti', np.NaN, 'Delhi' , 'India' ) ,
                 ('Vikas', 31, np.NaN , 'India' ) ,
                 ('Neelu', 32, 'Bangalore' , 'India' ) ,
                 ('John', 16, 'New York' , 'US') ,
                 ('John' , 11, np.NaN, np.NaN ) ,
                (np.NaN , np.NaN, np.NaN, np.NaN ) 
                 ]
    
    #Create a DataFrame object
    dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'])

Contents of the dataframe are,

    Name  Age       City    Country
0   jack  NaN     Sydeny  Australia
1   Riti  NaN      Delhi      India
2  Vikas   31        NaN      India
3  Neelu   32  Bangalore      India
4   John   16   New York         US
5   John   11        NaN        NaN
6    NaN  NaN        NaN        NaN

dataframe.isnull()

Now let’s count the number of NaN in this dataframe using dataframe.isnull()

Pandas Dataframe provides a function isnull(), it returns a new dataframe of same size as calling dataframe, it contains only True & False only. With True at the place NaN in original dataframe and False at other places. Let’s call this function on above dataframe dfObj i.e.

dfObj.isnull()

It will return a new DataFrame with True & False data i.e.

    Name    Age   City Country
0  False   True  False   False
1  False   True  False   False
2  False  False   True   False
3  False  False  False   False
4  False  False  False   False
5  False  False   True    True
6   True   True   True    True

This contains True at the place NaN in dfObj and False at other places. We are going to use this dataframe to calculate total NaN in original dataframe dfObj.

Count all NaN in a DataFrame (both columns & Rows)

dfObj.isnull().sum().sum()

Calling sum() of the DataFrame returned by isnull() will give the count of total NaN in dataframe i.e.

9

Now suppose we want to count the NaN in each column individually, let’s do that.

Count total NaN at each column in DataFrame

dfObj.isnull().sum()

Calling sum() of the DataFrame returned by isnull() will give a series containing data about count of NaN in each column i.e.

Name       1
Age        3
City       3
Country    2
dtype: int64

Count total NaN at each row in DataFrame

To count the total NaN in each row in dataframe, we need to iterate over each row in dataframe and call sum() on it i.e.

for i in range(len(dfObj.index)) :
    print("Nan in row ", i , " : " ,  dfObj.iloc[i].isnull().sum())

It’s output will be,

Nan in row  0  :  1
Nan in row  1  :  1
Nan in row  2  :  1
Nan in row  3  :  0
Nan in row  4  :  0
Nan in row  5  :  2
Nan in row  6  :  4

Complete example is as follows,

import pandas as pd
import numpy as np

def main():
    
    # List of Tuples
    students = [ ('jack', np.NaN, 'Sydeny' , 'Australia') ,
                 ('Riti', np.NaN, 'Delhi' , 'India' ) ,
                 ('Vikas', 31, np.NaN , 'India' ) ,
                 ('Neelu', 32, 'Bangalore' , 'India' ) ,
                 ('John', 16, 'New York' , 'US') ,
                 ('John' , 11, np.NaN, np.NaN ) ,
                (np.NaN , np.NaN, np.NaN, np.NaN ) 
                 ]
    
    #Create a DataFrame object
    dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'])

    print("Original Dataframe" , dfObj, sep='\n')

    print("Check NaN in Dataframe" , dfObj.isnull(), sep='\n')
    
    print("***Count all NaN in a DataFrame (both columns & Rows)***")
    
    print("Total NaN in Dataframe" , dfObj.isnull().sum().sum(), sep='\n')
    
    print("***Count NaN in each column of a DataFrame***")
    
    print("Nan in each columns" , dfObj.isnull().sum(), sep='\n')
    
    print("***Count NaN in each row of a DataFrame***")
    
    for i in range(len(dfObj.index)) :
        print("Nan in row ", i , " : " ,  dfObj.iloc[i].isnull().sum())
    
        
if __name__ == '__main__':
    main()


Output:

Original Dataframe
    Name  Age       City    Country
0   jack  NaN     Sydeny  Australia
1   Riti  NaN      Delhi      India
2  Vikas   31        NaN      India
3  Neelu   32  Bangalore      India
4   John   16   New York         US
5   John   11        NaN        NaN
6    NaN  NaN        NaN        NaN
Check NaN in Dataframe
    Name    Age   City Country
0  False   True  False   False
1  False   True  False   False
2  False  False   True   False
3  False  False  False   False
4  False  False  False   False
5  False  False   True    True
6   True   True   True    True
***Count all NaN in a DataFrame (both columns & Rows)***
Total NaN in Dataframe
9
***Count NaN in each column of a DataFrame***
Nan in each columns
Name       1
Age        3
City       3
Country    2
dtype: int64
***Count NaN in each row of a DataFrame***
Nan in row  0  :  1
Nan in row  1  :  1
Nan in row  2  :  1
Nan in row  3  :  0
Nan in row  4  :  0
Nan in row  5  :  2
Nan in row  6  :  4

 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top