In this article we will discuss how to find NaN or missing values in a Dataframe.
Manytimes we create a DataFrame from an exsisting dataset and it might contain some missing values in any column or row.  For every missing value Pandas add NaN at it’s place.
Let’s create a dataframe with missing values i.e.
# List of Tuples students = [ ('jack', np.NaN, 'Sydeny' , 'Australia') , ('Riti', np.NaN, 'Delhi' , 'India' ) , ('Vikas', 31, np.NaN , 'India' ) , ('Neelu', 32, 'Bangalore' , 'India' ) , ('John', 16, 'New York' , 'US') , ('John' , 11, np.NaN, np.NaN ) , (np.NaN , np.NaN, np.NaN, np.NaN ) ] #Create a DataFrame object dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'])
Contents of the dataframe are,
Name Age City Country 0 jack NaN Sydeny Australia 1 Riti NaN Delhi India 2 Vikas 31 NaN India 3 Neelu 32 Bangalore India 4 John 16 New York US 5 John 11 NaN NaN 6 NaN NaN NaN NaN
dataframe.isnull()
Now let’s count the number of NaN in this dataframe using dataframe.isnull()
Pandas Dataframe provides a function isnull(), it returns a new dataframe of same size as calling dataframe, it contains only True & False only. With True at the place NaN in original dataframe and False at other places. Let’s call this function on above dataframe dfObj i.e.
Frequently Asked:
- Python Pandas : Count NaN or missing values in DataFrame ( also row & column wise)
- Python Pandas : Drop columns in DataFrame by label Names or by Index Positions
- Add Row to Dataframe in Pandas
- Python Pandas : How to Drop rows in DataFrame by conditions on column values
dfObj.isnull()
It will return a new DataFrame with True & False data i.e.
Name Age City Country 0 False True False False 1 False True False False 2 False False True False 3 False False False False 4 False False False False 5 False False True True 6 True True True True
This contains True at the place NaN in dfObj and False at other places. We are going to use this dataframe to calculate total NaN in original dataframe dfObj.
Count all NaN in a DataFrame (both columns & Rows)
dfObj.isnull().sum().sum()
Calling sum() of the DataFrame returned by isnull() will give the count of total NaN in dataframe i.e.
9
Now suppose we want to count the NaN in each column individually, let’s do that.
Count total NaN at each column in DataFrame
dfObj.isnull().sum()
Calling sum() of the DataFrame returned by isnull() will give a series containing data about count of NaN in each column i.e.
Name 1 Age 3 City 3 Country 2 dtype: int64
Count total NaN at each row in DataFrame
To count the total NaN in each row in dataframe, we need to iterate over each row in dataframe and call sum() on it i.e.
for i in range(len(dfObj.index)) : print("Nan in row ", i , " : " , dfObj.iloc[i].isnull().sum())
It’s output will be,
Nan in row 0 : 1 Nan in row 1 : 1 Nan in row 2 : 1 Nan in row 3 : 0 Nan in row 4 : 0 Nan in row 5 : 2 Nan in row 6 : 4
Complete example is as follows,
import pandas as pd import numpy as np def main(): # List of Tuples students = [ ('jack', np.NaN, 'Sydeny' , 'Australia') , ('Riti', np.NaN, 'Delhi' , 'India' ) , ('Vikas', 31, np.NaN , 'India' ) , ('Neelu', 32, 'Bangalore' , 'India' ) , ('John', 16, 'New York' , 'US') , ('John' , 11, np.NaN, np.NaN ) , (np.NaN , np.NaN, np.NaN, np.NaN ) ] #Create a DataFrame object dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country']) print("Original Dataframe" , dfObj, sep='\n') print("Check NaN in Dataframe" , dfObj.isnull(), sep='\n') print("***Count all NaN in a DataFrame (both columns & Rows)***") print("Total NaN in Dataframe" , dfObj.isnull().sum().sum(), sep='\n') print("***Count NaN in each column of a DataFrame***") print("Nan in each columns" , dfObj.isnull().sum(), sep='\n') print("***Count NaN in each row of a DataFrame***") for i in range(len(dfObj.index)) : print("Nan in row ", i , " : " , dfObj.iloc[i].isnull().sum()) if __name__ == '__main__': main()
Output:
Original Dataframe Name Age City Country 0 jack NaN Sydeny Australia 1 Riti NaN Delhi India 2 Vikas 31 NaN India 3 Neelu 32 Bangalore India 4 John 16 New York US 5 John 11 NaN NaN 6 NaN NaN NaN NaN Check NaN in Dataframe Name Age City Country 0 False True False False 1 False True False False 2 False False True False 3 False False False False 4 False False False False 5 False False True True 6 True True True True ***Count all NaN in a DataFrame (both columns & Rows)*** Total NaN in Dataframe 9 ***Count NaN in each column of a DataFrame*** Nan in each columns Name 1 Age 3 City 3 Country 2 dtype: int64 ***Count NaN in each row of a DataFrame*** Nan in row 0 : 1 Nan in row 1 : 1 Nan in row 2 : 1 Nan in row 3 : 0 Nan in row 4 : 0 Nan in row 5 : 2 Nan in row 6 : 4