In this article, we will learn how to remove rows with NaN values from a NumPy Array.
Table Of Contents
The NaN stands for Not a Number, which is a numeric data type. It can be interpreted as a value that is undefined or unrepresentable. Usually NaN values are used to represent the missing data in a DataFrame or a NumPy Array. Given a NumPy array we need to delete the rows with NaN values in NumPy Array i.e delete the rows which has any Nan value.
Example: Given array: [[ 1 2 3 4 5] [ 5 nan 3 2 1] [ 1 2 nan 1 5] [ 3 4 3 2 1]] After removing rows with any NaN value: [[ 1 2 3 4 5] [ 3 4 3 2 1]]
There are multiple ways to Remove rows with any NaN value from a NumPy Array. Lets discuss all the methods one by one with proper approach and a working code example
Use delete() method and boolean index to delete rows containing atleast one Nan value
The delete() mehtod is a builtin method in numpy library. The delete() method is used to delete the elements from the given array, the delete method takes array and a index or array of indexes as parameters. It returns a new array by deleting the elements at given index.
Syntax of delete()
numpy.delete(arr, obj)
Parameters:
arr = The array to be passed to the function. obj = index (or array of index) of the rows to be deleted.
Returns:
Returns array with the rows removed.
To delete the rows containing atleast one Nan value, we need to use any() and isnan() function. First we will pass the given array to the isnan() and it will return a 2D array of same size but with the boolean values. This bool array contains True for the NaN values and False for all others. Then iterate over all rows in this 2D array and for each row call the any() function and store the values in a list.
This list will contain elements equal to the number of rows. For the row that has any NaN value, the corresponding value in this list will be True. Pass this boolean index list to the delete() method along with the given array. It will return an array after deleting all rows with any NaN value.
For example
import numpy as np # creating numpy array arr = np.array([[1, 2, 3, 4, 5], [np.nan, 4, np.nan, 2, 1], [np.nan, 2, 4, 1, 5], [3, 4, 3, 2, 1]]) # Get boolean index list of rows with True values for the rows # that has any NaN values indexList = [np.any(i) for i in np.isnan(arr)] # delete all the rows with any NaN value arr = np.delete(arr, indexList, axis=0) print(arr)
Output
[[1. 2. 3. 4. 5.] [3. 4. 3. 2. 1.]]
It deleted all the rows from NumPy Array which had any NaN value.
Use delete() method and boolean index to delete rows if entire row has NaN values
This is very much similar to the above approach except that we use all() method instead of any() method. To delete the rows if the entire row has nan values, we need to use the all() and the isnan() function.
First we need to pass the given array to the isnan() function and it returns a 2D array of same size but with the boolean values. This 2D bool array contains True for the all the NaN values and False for all the other values. Then iterate over all rows in this 2D array and for each row call the all() function and store the values in a list.
This list will contain elements equal to the number of rows. For the row that has all the NaN values, the corresponding value in this list will be True. Pass this boolean index list to the delete() method along with the given array. It will return a 2D NumPy Array after deleting all rows with all NaN values.
For Example
import numpy as np # creating numpy array arr = np.array([[1, 2, 3, 4, 5], [np.nan,np.nan, np.nan,np.nan, np.nan], [np.nan, 2, 4, 1, 5], [3, 4, 3, 2, 1]]) # Get boolean index list of rows with True values for the rows # that has all NaN values indexList = [np.all(i) for i in np.isnan(arr)] # delete all the rows with all NaN value arr = np.delete(arr, indexList, axis=0) print(arr)
Output:
[[ 1. 2. 3. 4. 5.] [nan 2. 4. 1. 5.] [ 3. 4. 3. 2. 1.]]
Use boolean index to delete rows if the rows has any NaN value
This is very much similar to the above, instead of the delete() method we will pass the boolean index to the array. The Rows in a numpy array can be accesed by passing a boolean array as index to the array
Example: arr = [ [1, 2, 3, 4, 5], [5, 4, 3, 2, 1], [8, 2, 4, 1, 5], [3, 4, 3, 2, 1], [7, 6, 3, 4, 5]] boolArray = [True, True, False, False, False] arr[boolArray] ===> this will give [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1]]
This approach is similar to first one but instead of using the delete() function we will use the [] opeartor of NumPy array to select only those rows do not have NaN value.
First we need to pass the given array to the isnan() function and it returns a 2D array of same size but with the boolean values. This 2D bool array contains True for the all the NaN values and False for all the other values. Then iterate over all rows in this 2D array and for each row call the any() function and get a negate of that using the not operator . Then store the values in a list.
This list will contain elements equal to the number of rows. For the row that does not have any NaN values, the corresponding value in this list will be True. Pass this boolean index list to the [] operator of given array. It will return a 2D NumPy Array after deleting all rows with any NaN values.
For example
import numpy as np # creating numpy array arr = np.array([[1, 2, 3, 4, 5], [np.nan, 4, np.nan, 2, 1], [np.nan, 2, 4, 1, 5], [3, 4, 3, 2, 1]]) # Delete all rows with any NaN value booleanIndex = [not np.any(i) for i in np.isnan(arr)] arr = arr[booleanIndex] print(arr)
Output:
[[1. 2. 3. 4. 5.] [3. 4. 3. 2. 1.]]
Use boolean index to delete rows if entire row has nan values
This is very much similar to the previous approach. But instead of the any() method we will use the all() method.
For example
import numpy as np # creating numpy array arr = np.array([[1, 2, 3, 4, 5], [np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, 2, 4, 1, 5], [3, 4, 3, 2, 1]]) # Delete all rows with all NaN value booleanIndex = [not np.all(i) for i in np.isnan(arr)] arr = arr[booleanIndex] print(arr)
Output:
[[ 1. 2. 3. 4. 5.] [nan 2. 4. 1. 5.] [ 3. 4. 3. 2. 1.]]
Summary
Great! you made it, We have disussed all possible methods to delete rows with NaN values in a NumPy Array. Happy learning.
Pandas Tutorials Learn Data Analysis with Python

Pandas Tutorial Part #1  Introduction to Data Analysis with Python

Pandas Tutorial Part #2  Basics of Pandas Series

Pandas Tutorial Part #3  Get & Set Series values

Pandas Tutorial Part #4  Attributes & methods of Pandas Series

Pandas Tutorial Part #5  Add or Remove Pandas Series elements

Pandas Tutorial Part #6  Introduction to DataFrame

Pandas Tutorial Part #7  DataFrame.loc[]  Select Rows / Columns by Indexing

Pandas Tutorial Part #8  DataFrame.iloc[]  Select Rows / Columns by Label Names

Pandas Tutorial Part #9  Filter DataFrame Rows

Pandas Tutorial Part #10  Add/Remove DataFrame Rows & Columns

Pandas Tutorial Part #11  DataFrame attributes & methods

Pandas Tutorial Part #12  Handling Missing Data or NaN values

Pandas Tutorial Part #13  Iterate over Rows & Columns of DataFrame

Pandas Tutorial Part #14  Sorting DataFrame by Rows or Columns

Pandas Tutorial Part #15  Merging or Concatenating DataFrames

Pandas Tutorial Part #16  DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most soughtafter professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.