Check if any value is NaN in a Pandas DataFrame

Handling missing values or NaN (Not A Number) is one of the most important steps in any data analysis project. In this article, we will discuss how to check and count if there is any missing value in a pandas DataFrame.

Table of Contents

Introuction

To quickly get started, let’s create a sample dataframe with missing values for the experimentation. We’ll use the pandas and numpy library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', np.NaN ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', np.NaN ,   2),
            ('Aadi', 'Data Engineering', 'New York', np.NaN)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

Contents of the created dataframe are,

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney         5.0
1     Riti      Data Analyst       NaN         7.0
2   Shanky   Program Manager     Delhi         2.0
3   Shreya  Graphic Designer       NaN         2.0
4     Aadi  Data Engineering  New York         NaN

Now, let’s look at different ways in which we could check for missing values.

Advertisements

Check for missing values in Complete DataFrame

There are multiple ways to check if the DataFrame contains any missing values. Let’s quickly explore some of those methods.

# using isnull method
print (df.isnull().any().any())

Output

True

Another method,

# using isna method
print (df.isna().any(axis = None))

Output

True

Both the methods “isnull()” and “isna()” search for any np.NaN value in the entire DataFrame. Please make sure that you don’t cast your column into a string, as it would convert the missing values into “NaN” string which can’t be tracked as NaN value.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', np.NaN ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', np.NaN ,   2),
            ('Aadi', 'Data Engineering', 'New York', np.NaN)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# using isnull method
print (df.isnull().any().any())

# using isna method
print (df.isna().any(axis = None))

Output

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney         5.0
1     Riti      Data Analyst       NaN         7.0
2   Shanky   Program Manager     Delhi         2.0
3   Shreya  Graphic Designer       NaN         2.0
4     Aadi  Data Engineering  New York         NaN

True
True

Check for missing values in a DataFrame column

In case we need to check the missing values at column level or for any specific column, we could again use the same methods.

# using isnull method
print (df.isnull().any())

# or you can use the isna method
print (df.isna().any())

Output

Name           False
Designation    False
City            True
Experience      True
dtype: bool

Name           False
Designation    False
City            True
Experience      True
dtype: bool

Both the methods would generate similar outputs. Now, for instance, we need to check the missing values for a specific column, here is something we can do.

# feel free to use isna as well
print (df['City'].isnull().any())

# using hasnans method
print (df['City'].hasnans)

Output

True
True

Now, within the same column, we need to check exactly which rows contains a missing value.

# feel free to use isna as well
print (df['City'].isnull())

Output

0    False
1     True
2    False
3     True
4    False
Name: City, dtype: bool

We observe that row indices 1 and 3 contain missing values.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', np.NaN ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', np.NaN ,   2),
            ('Aadi', 'Data Engineering', 'New York', np.NaN)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# using isnull method
print (df.isnull().any())

# or you can use the isna method
print (df.isna().any())


# feel free to use isna as well
print (df['City'].isnull().any())

# using hasnans method
print (df['City'].hasnans)

# feel free to use isna as well
print (df['City'].isnull())

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney         5.0
1     Riti      Data Analyst       NaN         7.0
2   Shanky   Program Manager     Delhi         2.0
3   Shreya  Graphic Designer       NaN         2.0
4     Aadi  Data Engineering  New York         NaN

Name           False
Designation    False
City            True
Experience      True
dtype: bool

Name           False
Designation    False
City            True
Experience      True
dtype: bool

True

True

0    False
1     True
2    False
3     True
4    False
Name: City, dtype: bool

Count missing values in a Pandas DataFrame

For the complete data analysis, knowing whether the DataFrame contains any missing value is just part one of the problems. The second part is to understand how much data contains missing values. Let’s try to count the number of missing values in the DataFrame.

# total missing values in the DataFrame
print (df.isnull().sum().sum())

Output

3

As noticed, there are three NaN values in the DataFrame. Again, we can use the isna method as well here.

Count missing values in a DataFrame column

We can break down the missing values count at the column level, let’s quickly take a look at the code below.

# total missing values (column-wise)
print (df.isnull().sum())

Output

Name           0
Designation    0
City           2
Experience     1
dtype: int64

So the column “City” contains 2 missing values and the column “Experience” contains 1 missing value. We can also convert this to percentages for better interpretation.

# % missing values (column-wise)
print (df.isnull().sum()/df.shape[0])

Output

Name           0.0
Designation    0.0
City           0.4
Experience     0.2
dtype: float64

This gives a better interpretation that the column “City” contains 40% missing records and 20% missing for the “Experience” column. We could similarly execute this for any specific column as shown below.

# % missing values (column-wise)
print (df['City'].isnull().sum()/df.shape[0])

Output

0.4

The complete example of counting missing values is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', np.NaN ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', np.NaN ,   2),
            ('Aadi', 'Data Engineering', 'New York', np.NaN)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# total missing values in the DataFrame
print (df.isnull().sum().sum())

# total missing values (column-wise)
print (df.isnull().sum())

# % missing values (column-wise)
print (df.isnull().sum()/df.shape[0])

# % missing values (column-wise)
print (df['City'].isnull().sum()/df.shape[0])

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney         5.0
1     Riti      Data Analyst       NaN         7.0
2   Shanky   Program Manager     Delhi         2.0
3   Shreya  Graphic Designer       NaN         2.0
4     Aadi  Data Engineering  New York         NaN

3

Name           0
Designation    0
City           2
Experience     1
dtype: int64

Name           0.0
Designation    0.0
City           0.4
Experience     0.2
dtype: float64

0.4

Summary

Great, you made it! In this article, we have discussed multiple ways to check the missing values (NaN) in a pandas DataFrame. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top