Check if any value is NaN in a Pandas DataFrame

Handling missing values or NaN (Not A Number) is one of the most important steps in any data analysis project. In this article, we will discuss how to check and count if there is any missing value in a pandas DataFrame.

Table of Contents

Introuction

To quickly get started, let’s create a sample dataframe with missing values for the experimentation. We’ll use the pandas and numpy library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', np.NaN ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', np.NaN ,   2),
            ('Aadi', 'Data Engineering', 'New York', np.NaN)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

Contents of the created dataframe are,

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney         5.0
1     Riti      Data Analyst       NaN         7.0
2   Shanky   Program Manager     Delhi         2.0
3   Shreya  Graphic Designer       NaN         2.0
4     Aadi  Data Engineering  New York         NaN

Now, let’s look at different ways in which we could check for missing values.

Check for missing values in Complete DataFrame

There are multiple ways to check if the DataFrame contains any missing values. Let’s quickly explore some of those methods.

# using isnull method
print (df.isnull().any().any())

Output

True

Another method,

# using isna method
print (df.isna().any(axis = None))

Output

True

Both the methods “isnull()” and “isna()” search for any np.NaN value in the entire DataFrame. Please make sure that you don’t cast your column into a string, as it would convert the missing values into “NaN” string which can’t be tracked as NaN value.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', np.NaN ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', np.NaN ,   2),
            ('Aadi', 'Data Engineering', 'New York', np.NaN)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# using isnull method
print (df.isnull().any().any())

# using isna method
print (df.isna().any(axis = None))

Output

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney         5.0
1     Riti      Data Analyst       NaN         7.0
2   Shanky   Program Manager     Delhi         2.0
3   Shreya  Graphic Designer       NaN         2.0
4     Aadi  Data Engineering  New York         NaN

True
True

Check for missing values in a DataFrame column

In case we need to check the missing values at column level or for any specific column, we could again use the same methods.

# using isnull method
print (df.isnull().any())

# or you can use the isna method
print (df.isna().any())

Output

Name           False
Designation    False
City            True
Experience      True
dtype: bool

Name           False
Designation    False
City            True
Experience      True
dtype: bool

Both the methods would generate similar outputs. Now, for instance, we need to check the missing values for a specific column, here is something we can do.

# feel free to use isna as well
print (df['City'].isnull().any())

# using hasnans method
print (df['City'].hasnans)

Output

True
True

Now, within the same column, we need to check exactly which rows contains a missing value.

# feel free to use isna as well
print (df['City'].isnull())

Output

0    False
1     True
2    False
3     True
4    False
Name: City, dtype: bool

We observe that row indices 1 and 3 contain missing values.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', np.NaN ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', np.NaN ,   2),
            ('Aadi', 'Data Engineering', 'New York', np.NaN)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# using isnull method
print (df.isnull().any())

# or you can use the isna method
print (df.isna().any())


# feel free to use isna as well
print (df['City'].isnull().any())

# using hasnans method
print (df['City'].hasnans)

# feel free to use isna as well
print (df['City'].isnull())

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney         5.0
1     Riti      Data Analyst       NaN         7.0
2   Shanky   Program Manager     Delhi         2.0
3   Shreya  Graphic Designer       NaN         2.0
4     Aadi  Data Engineering  New York         NaN

Name           False
Designation    False
City            True
Experience      True
dtype: bool

Name           False
Designation    False
City            True
Experience      True
dtype: bool

True

True

0    False
1     True
2    False
3     True
4    False
Name: City, dtype: bool

Count missing values in a Pandas DataFrame

For the complete data analysis, knowing whether the DataFrame contains any missing value is just part one of the problems. The second part is to understand how much data contains missing values. Let’s try to count the number of missing values in the DataFrame.

# total missing values in the DataFrame
print (df.isnull().sum().sum())

Output

3

As noticed, there are three NaN values in the DataFrame. Again, we can use the isna method as well here.

Count missing values in a DataFrame column

We can break down the missing values count at the column level, let’s quickly take a look at the code below.

# total missing values (column-wise)
print (df.isnull().sum())

Output

Name           0
Designation    0
City           2
Experience     1
dtype: int64

So the column “City” contains 2 missing values and the column “Experience” contains 1 missing value. We can also convert this to percentages for better interpretation.

# % missing values (column-wise)
print (df.isnull().sum()/df.shape[0])

Output

Name           0.0
Designation    0.0
City           0.4
Experience     0.2
dtype: float64

This gives a better interpretation that the column “City” contains 40% missing records and 20% missing for the “Experience” column. We could similarly execute this for any specific column as shown below.

# % missing values (column-wise)
print (df['City'].isnull().sum()/df.shape[0])

Output

0.4

The complete example of counting missing values is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', np.NaN ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', np.NaN ,   2),
            ('Aadi', 'Data Engineering', 'New York', np.NaN)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# total missing values in the DataFrame
print (df.isnull().sum().sum())

# total missing values (column-wise)
print (df.isnull().sum())

# % missing values (column-wise)
print (df.isnull().sum()/df.shape[0])

# % missing values (column-wise)
print (df['City'].isnull().sum()/df.shape[0])

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney         5.0
1     Riti      Data Analyst       NaN         7.0
2   Shanky   Program Manager     Delhi         2.0
3   Shreya  Graphic Designer       NaN         2.0
4     Aadi  Data Engineering  New York         NaN

3

Name           0
Designation    0
City           2
Experience     1
dtype: int64

Name           0.0
Designation    0.0
City           0.4
Experience     0.2
dtype: float64

0.4

Summary

Great, you made it! In this article, we have discussed multiple ways to check the missing values (NaN) in a pandas DataFrame. Thanks.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top