Remove Columns with NaN values from a NumPy Array

In this article, we will learn how to remove columns from a NumPy Array which contain NaN values.

Table Of Contents

What is NaN value?

The NaN stands for Not a Number, which is a numeric data type and it can be interpreted as a value that is undefined or unrepresentable. Usually the NaN values are used to represent the missing data in a DataFrame or a NumPy Array.

Given a NumPy array we need to Remove columns with nan values, from a 2D NumPy Array i.e delete the columns which has Nan values.

Example:

Advertisements
Given array :
               [[ 1   2  3    4  5]
                [nan, 4, nan, 2, 1],
                [nan, 2, 4,   1, 5], 
                [ 3   4  3    2  1]]

After removing columns with nan values :  
                                      [[2. 4. 5.]
                                       [4. 2. 1.]
                                       [2. 1. 5.]
                                       [4. 2. 1.]]

There are multiple ways to remove columns with NaN values, from a NumPy Array. Lets discuss all the methods one by one with proper approach and a working code example

Delete columns containing atleast one NaN values using delete(), isnan() and any()

The delete() method is a built-in method in the numpy library. It is used to delete the elements from the given array. The delete() method takes an array and an index or array of indices as parameters. It returns a copy of array after deleting the elements at given index.

Syntax of delete()

numpy.delete(arr, obj, axis)
  • Parameters:
    • arr = The array from which we need to delete the elements.
    • obj = index (or array of indices) of the columns to be deleted.
    • axis = Axis along which elements needs to be deleted. For columns axis = 1.
  • Returns:
    • Returns a copy of array with the columns removed.

In this example, to delete the columns containing atleast one NaN value, we need to use any() function and isnan() function. First we will pass the given 2D NumPy Array to the isnan() function. It will return a 2D array of same size but with the boolean values. Each True value in this boolean array indicates that the corresponding value in original array is NaN.

Then pass this boolean array to the any() method. It will return an another boolean array but its length will be equal to the number of columns in original array. Each True value in this array indicates that the corresponding column in original array has any NaN value. Then pass this boolean array to the delete() method along with the given array, if the value in the boolean index is true then the corresponding column from array will be deleted.

Source Code

import numpy as np

# creating numpy array
arr = np.array([[1, 2, 3, 4, 5],
                [np.nan, 4, np.nan, 2, 1],
                [np.nan, 2, 4, 1, 5],
                [3, 4, 3, 2, 1]])

# Get an index of columns which has any NaN value
index = np.isnan(arr).any(axis=0)

# Delete columns with any NaN value from 2D NumPy Array
arr = np.delete(arr, index,axis=1)

print(arr)

Output:

[[2. 4. 5.]
 [4. 2. 1.]
 [2. 1. 5.]
 [4. 2. 1.]]

Delete columns containing all NaN values using delete(), isnan() and all()

This is very much similar to the above approach except that we use all() method instead of any() method.

In this example, to delete the columns containing all NaN values, we need to use all() function and isnan() function. First we will pass the given 2D NumPy Array to the isnan() function of numpy module. It will return a 2D NumPy array of equal size but with the bool values only. Each True value in this indicates that the corresponding value in original NumPy Array is NaN.

Then pass this boolean array to the all() method. It will return an another bool array containing elements equal to the number of columns in original array. Each True value in this array indicates that the corresponding column in original array has all NaN values in it. Then pass this boolean array to the delete() method along with the given array, if the value in the boolean index is True then the corresponding column from NumPy array will be deleted.

Source Code

import numpy as np

# Creating numpy array
arr = np.array([[np.nan, 2, 3, 4, 5],
                [np.nan, 4, 3, 2, 1],
                [np.nan, 2, 4, 1, 5],
                [np.nan, 4, 3, 2, 1]])

# Get an index of columns which has all NaN values
index = np.isnan(arr).all(axis=0)

# Delete columns with all NaN values from a 2D NumPy Array
arr = np.delete(arr, index,axis=1)

print(arr)

Output:

[[2. 3. 4. 5.]
 [4. 3. 2. 1.]
 [2. 4. 1. 5.]
 [4. 3. 2. 1.]]

Using boolean index to delete columns with any NaN value

This approach is very much similar to the previous one. Instead of the delete() method we will pass the boolean index to the array as index. The Columns in a numpy array can be accessed by passing a boolean array as index to the array.

Example

Given array :
               [[ 1, 2, 3, 4, 5]
                [ 5, 4, 3, 2, 1],
                [ 1, 2, 4, 1, 5], 
                [ 3, 4, 3, 2, 1]]

boolArray = [False, True, False, True, True]

arr[: , boolArray] will be:  
                    [[2. 4. 5.]
                    [4. 2. 1.]
                    [2. 1. 5.]
                    [4. 2. 1.]]

It selected all the columns for which index had True values.

Steps to remove columns with any NaN value:

  1. Import numpy library and create numpy array.
  2. Create a boolean array using any() and isnan() and negate it. True value in indicates the corresponding column has no NaN value
  3. Pass the boolean array as index to the array.
  4. This will return the array with the columns having NaN values deleted.
  5. Print the Array.

Source Code

import numpy as np

# creating numpy array
arr = np.array([[1, 2, 3, 4, 5],
                [np.nan, 4, np.nan, 2, 1],
                [np.nan, 2, 4, 1, 5],
                [3, 4, 3, 2, 1]])

# Get the indices of column with no NaN value
booleanIndex  = ~np.isnan(arr).any(axis=0)

# Select columns which have no NaN value
arr = arr[:,booleanIndex]

print(arr)

Output:

[[2. 4. 5.]
 [4. 2. 1.]
 [2. 1. 5.]
 [4. 2. 1.]]

Using boolean index to delete columns with all nan values

This is very much similar to the approach 3, instead of the any() method we will use the all() method. The Columns in a numpy array can be accessed by passing a boolean array as index to the array

Example:

Given array :
               [[ 1, 2, 3, 4, 5]
                [ 5, 4, 3, 2, 1],
                [ 1, 2, 4, 1, 5], 
                [ 3, 4, 3, 2, 1]]

boolArray = [False, True, False, True, True]

arr[: , boolArray] :  
                [[2. 4. 5.]
                 [4. 2. 1.]
                 [2. 1. 5.]
                 [4. 2. 1.]]

It selected all the columns for which index had True values.

Steps to remove columns with any NaN value:

  1. Import numpy library and create numpy array.
  2. Create a boolean array using all() and isnan() and negate it. False value in indicates the corresponding column has all NaN values
  3. Pass the boolean array as index to the array.
  4. This will return the array with the columns with all NaN values deleted.
  5. Print the Array.

Source Code

import numpy as np

# creating numpy array
arr = np.array([[np.nan, 2, 3, 4, 5],
                [np.nan, 4, np.nan, 2, 1],
                [np.nan, 2, 4, 1, 5],
                [np.nan, 4, 3, 2, 1]])

# Get the indices of columns in which all values are not NaN
booleanIndex  = ~np.isnan(arr).all(axis=0)

# Select columns in which all values are not NaN
arr = arr[:,booleanIndex]

print(arr)

Output:

[[ 2.  3.  4.  5.]
 [ 4. nan  2.  1.]
 [ 2.  4.  1.  5.]
 [ 4.  3.  2.  1.]]

Summary

Great! you made it, We have discussed All possible methods to Remove Columns with NaN values in NumPy Array. Happy learning

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top