In this article we will discuss how to find NaN or missing values in a Dataframe.
Manytimes we create a DataFrame from an exsisting dataset and it might contain some missing values in any column or row. For every missing value Pandas add NaN at it’s place.
Let’s create a dataframe with missing values i.e.
# List of Tuples students = [ ('jack', np.NaN, 'Sydeny' , 'Australia') , ('Riti', np.NaN, 'Delhi' , 'India' ) , ('Vikas', 31, np.NaN , 'India' ) , ('Neelu', 32, 'Bangalore' , 'India' ) , ('John', 16, 'New York' , 'US') , ('John' , 11, np.NaN, np.NaN ) , (np.NaN , np.NaN, np.NaN, np.NaN ) ] #Create a DataFrame object dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'])
Contents of the dataframe are,
Name Age City Country 0 jack NaN Sydeny Australia 1 Riti NaN Delhi India 2 Vikas 31 NaN India 3 Neelu 32 Bangalore India 4 John 16 New York US 5 John 11 NaN NaN 6 NaN NaN NaN NaN
dataframe.isnull()
Now let’s count the number of NaN in this dataframe using dataframe.isnull()
Pandas Dataframe provides a function isnull(), it returns a new dataframe of same size as calling dataframe, it contains only True & False only. With True at the place NaN in original dataframe and False at other places. Let’s call this function on above dataframe dfObj i.e.
dfObj.isnull()
It will return a new DataFrame with True & False data i.e.
Name Age City Country 0 False True False False 1 False True False False 2 False False True False 3 False False False False 4 False False False False 5 False False True True 6 True True True True
This contains True at the place NaN in dfObj and False at other places. We are going to use this dataframe to calculate total NaN in original dataframe dfObj.
Count all NaN in a DataFrame (both columns & Rows)
dfObj.isnull().sum().sum()
Calling sum() of the DataFrame returned by isnull() will give the count of total NaN in dataframe i.e.
9
Now suppose we want to count the NaN in each column individually, let’s do that.
Count total NaN at each column in DataFrame
dfObj.isnull().sum()
Calling sum() of the DataFrame returned by isnull() will give a series containing data about count of NaN in each column i.e.
Name 1 Age 3 City 3 Country 2 dtype: int64
Count total NaN at each row in DataFrame
To count the total NaN in each row in dataframe, we need to iterate over each row in dataframe and call sum() on it i.e.
for i in range(len(dfObj.index)) : print("Nan in row ", i , " : " , dfObj.iloc[i].isnull().sum())
It’s output will be,
Nan in row 0 : 1 Nan in row 1 : 1 Nan in row 2 : 1 Nan in row 3 : 0 Nan in row 4 : 0 Nan in row 5 : 2 Nan in row 6 : 4
Complete example is as follows,
import pandas as pd import numpy as np def main(): # List of Tuples students = [ ('jack', np.NaN, 'Sydeny' , 'Australia') , ('Riti', np.NaN, 'Delhi' , 'India' ) , ('Vikas', 31, np.NaN , 'India' ) , ('Neelu', 32, 'Bangalore' , 'India' ) , ('John', 16, 'New York' , 'US') , ('John' , 11, np.NaN, np.NaN ) , (np.NaN , np.NaN, np.NaN, np.NaN ) ] #Create a DataFrame object dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country']) print("Original Dataframe" , dfObj, sep='\n') print("Check NaN in Dataframe" , dfObj.isnull(), sep='\n') print("***Count all NaN in a DataFrame (both columns & Rows)***") print("Total NaN in Dataframe" , dfObj.isnull().sum().sum(), sep='\n') print("***Count NaN in each column of a DataFrame***") print("Nan in each columns" , dfObj.isnull().sum(), sep='\n') print("***Count NaN in each row of a DataFrame***") for i in range(len(dfObj.index)) : print("Nan in row ", i , " : " , dfObj.iloc[i].isnull().sum()) if __name__ == '__main__': main()
Output:
Original Dataframe Name Age City Country 0 jack NaN Sydeny Australia 1 Riti NaN Delhi India 2 Vikas 31 NaN India 3 Neelu 32 Bangalore India 4 John 16 New York US 5 John 11 NaN NaN 6 NaN NaN NaN NaN Check NaN in Dataframe Name Age City Country 0 False True False False 1 False True False False 2 False False True False 3 False False False False 4 False False False False 5 False False True True 6 True True True True ***Count all NaN in a DataFrame (both columns & Rows)*** Total NaN in Dataframe 9 ***Count NaN in each column of a DataFrame*** Nan in each columns Name 1 Age 3 City 3 Country 2 dtype: int64 ***Count NaN in each row of a DataFrame*** Nan in row 0 : 1 Nan in row 1 : 1 Nan in row 2 : 1 Nan in row 3 : 0 Nan in row 4 : 0 Nan in row 5 : 2 Nan in row 6 : 4
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.