In this article, we will learn how remove duplicate elements or rows or columns from a NumPy Array in Python.
Table Of Contents
 Remove duplicates from NumPy Array using unique() method
 Remove duplicates from NumPy Array using set() method
 Using unique() method along with the return_index parameter
 Removing duplicates from a 1D NumPy Array by iterating
 Removing duplicates from a 2D array by iterating array
 Using numpy.lexsort() and np.diff() methods
Given a NumPy array, we need to remove the duplicates i.e the elements which are repeating more than once from the array. For example, if our input NumPy array is,
Input Array : [1,2,3,4,4,5,6,7]
Then after deleting duplicate elements from this NumPy Array, our content must be like,
Output Array : [1,2,3,4,5,6,7]
There are multiple ways of removing duplicates from a NumPy Array. Let’s discuss all the methods one by one with proper approach and a working code example
Remove duplicates from NumPy Array using unique() method
The unique() method is a builtin method in the numpy, that takes an array as input and return a unique array i.e by removing all the duplicate elements. In order to remove duplicates we will pass the given NumPy array to the unique() method and it will return the unique array.
Syntax:
numpy.unique(arr, return_index=False, return_inverse=False, return_counts=False, axis=None) Parameters: arr = The array to be passed to the function. return_index = If True, returns the indices of unique array return_inverse = If True, also returns the indices of unique array axis = Axis 0 represents rows and axis 1 represents columns, if no axis is provided then the input array will be flattened i.e treated as a 1d array
Delete duplicate elements from 1D NumPy Array
Approach :
 Import numpy library and create a numpy array.
 Pass the array to the unique() method without axis parameter.
 The function will return the unique array.
 print the resultant array.
import numpy as np # Create a NumPy Aray data = np.array([1,2,3,4,4,5,6,7]) # Pass array to the unique function # It will remove the duplicates. data = np.unique(data) print(data)
Output:
[1 2 3 4 5 6 7]
It delete all the duplicate elements from the NumPy Array.
Delete duplicate rows from 2D NumPy Array
To remove the duplicate rows from a 2D NumPy array use the following steps,
 Import numpy library and create a numpy array
 Pass the array to the unique() method axis=0 parameter
 The function will return the unique array
 print the resultant array.
Source code
import numpy as np # create numpy arrays data = np.array([[1,2,3], [3,2,1], [7,8,9], [9,8,9], [7,8,9]]) # Delete duplicate rows from 2D NumPy Array data = np.unique(data, axis=0) print(data)
OUTPUT:
[[1 2 3] [3 2 1] [7 8 9] [9 8 9]]
It deleted all the duplicate rows from the 2d NumPy Array.
Delete duplicate columns from 2D NumPy Array
To remove the duplicate columns from a 2D NumPy array use the following steps,
 Import numpy library and create a numpy array
 Pass the array to the unique() method axis=1 parameter
 The function will return the unique array
Source code
import numpy as np # create numpy arrays data = np.array([[1, 14, 3, 14, 14], [3, 13, 1, 13, 13], [7, 12, 9, 12, 12], [9, 11, 9, 11, 11], [7, 10, 9, 10, 10]]) # Remove Duplicate columns from 2D NumPy Array data = np.unique(data, axis=1) print(data)
Output:
[[ 1 3 14] [ 3 1 13] [ 7 9 12] [ 9 9 11] [ 7 9 10]]
Remove duplicates from NumPy Array using set() method
The set() method is a builtin method in python that takes an iterable as input and return a set iterable with distinct elements only.
Syntax:
set(iterable) Parameters: Any iterable like tuple. It returns a iterable with unique elements
Let’s use this function to delete duplicate rows from 2D NumPy Array.
Approach :
 import numpy library and create a numpy array
 Iterate over each row of 2D array and get row contents as a tuple because numpy array is unhashable
 Pass the hashable row tuples to the set()
 set() method will return a iterator with unique elements/tuples
 using numpy.vstack() we will join the array vertically.
 Print the resultant array.
Source code
import numpy as np # create numpy arrays data = np.array([[1,2,3], [3,2,1], [7,8,9], [9,8,9], [7,8,9]]) # Delete duplicate rows from 2D NumPy Array data = np.vstack(list(set(tuple(row) for row in data))) print(data)
OUTPUT:
[[9 8 9] [7 8 9] [3 2 1] [1 2 3]]
Using unique() method along with the return_index parameter
Delete duplicate rows from 2D NumPy Array using unique() function
The unique() method is a builtin method in numpy that takes an array as input and return a unique array i.e by removing all the duplicate elements.
In this case we need to remove the duplicates of given array, So we create a random array with length as number of columns in the original array and we will multiply the random array with given array. The resultant array will be passed as input argument to the unique() method with the return_index parameter as True, so this method will return the index of the unique array. The index will give us an unique array.
Syntax:
numpy.unique(arr, return_index=False, return_inverse=False, return_counts=False, axis=None) Parameters: arr = The array to be passed to the function. return_index = If True, returns the indices of unique array return_inverse = If True, also returns the indices of unique array axis = Axis 0 represents rows and axis 1 represents columns, if no axis is provided then the input array will be flattened i.e treated as a 1d array
Approach :
 import numpy library and create a numpy array
 Create a random array with length as number of columns in the array
 multiply the random array and given array using np.dot() method i.e. dot product, in this case matrix multiplication
 Pass resultant array as input argument to the unique() method with the return_index parameter as True
 The method will return the index of the unique array.
 The index is used to print the unique array of the given array
Source code
import numpy as np # create numpy arrays data = np.array([[1,2,3], [3,2,1], [7,8,9], [9,8,9], [7,8,9]]) # creating a random array a = np.random.rand(data.shape[1]) # multiply the given array and random array. b = data.dot(a) # pass the resultant array to the unique() unique, index = np.unique(b, return_index=True) # use the index to print the unique array from given array data = data[index] print(data)
OUTPUT:
[[3 2 1] [1 2 3] [7 8 9] [9 8 9]]
Removing duplicates from a 1D NumPy Array by iterating
Given an 1d array, for each element in the array we will check if it is repeated in the array, if repeated we will remove the element else we will keep it.
Approach :
 import numpy library and create a numpy array
 Initialise an empty list and name it as unique.
 Iterate over the numpy array and for each element check if the element is present in unique list
 If the element is not present in the unique list then add it to the list, else continue.
 Now make a numpy array from the unique list
Source code
import numpy as np # create a numpy array data=np.array([1, 2, 3, 4, 4, 6, 5, 6, 7]) # creating a empty list unique=[] # iterating each element of array for i in data: # if element is not present in the list # add the element to list. if i not in unique: unique.append(i) data=np.array(unique) print(data)
OUTPUT:
[1 2 3 4 6 5 7]
Removing duplicates from a 2D array by iterating array
Given an 2d array, for each array in the array we will check if it is repeated in the array, if repeated we will remove the array else we will keep it.
Approach :
 import numpy library and create a numpy array
 Initialise an empty list and name it as unique.
 Iterate over the numpy array and for each array check if the array is present in unique list
 If the array is not present in the unique list then add it to the list, else continue.
 Now make a numpy array from the unique list
Source code
import numpy as np # create 2D NumPy Array data=np.array([ [1,2,3], [5,6,7], [7,8,9], [9,8,9], [7,8,9]]) unique=[] # iterating each array of array for i in data: # if array is not present in the list # add the array to list. if list(i) not in unique: unique.append(list(i)) data=np.array(unique) print(data)
OUTPUT:
[[1 2 3] [5 6 7] [7 8 9] [9 8 9]]
Using numpy.lexsort() and np.diff() methods
lexsort()
The lexsort() is the indirect stable sort, the lexsort() takes an array of sorting keys, which can be interpreted as columns in a numpy array, lexsort returns an array of integer indices that describes the sort order by multiple columns.
Syntax:
numpy.lexsort(keys, axis) Parameters: Keys : sorting keys axis : Axis to be indirectly sorted. Returns: Array of indices that sort the keys along the specified axis.
numpy.diff()
The diff() method is used to Calculate the difference along the given axis.
Syntax:
numpy.diff(arr, n, axis) Parameters: arr : [array_like] Input array. n : The number of times values are differenced. axis : The axis along which the difference is taken. Returns: differences along the axis and size will be smaller than length of actual array.
To remove the duplicates from the array , We will sort the given NumPy array using the lexsort() and after sorting if there are any duplicates then they will be adjacent. Now the sorted array is passed to diff() method which will find the differences along the array, if there are any duplicates the difference will be zero. We use the any() method to find the nonzero rows and this will be used to get unique array from the sorted array.
Approach :
 import numpy library and create a numpy array
 Pass the transpose of the given array as sorting keys to the lexsort() method
 The given array is sorted by using the sorting index returned by the lexsort method
 The sorted array is passes to the numpy diff() method , which will find the diffrences along the axis
 any() method to find the nonzero rows
 The Nonzero row info is used to make the unique array from the sorted array.
NOTE : You can better understand this approach once you take a look at the Code.
Source code
import numpy as np # create 2D NumPy Array arr = np.array([[1,2,3], [5,6,7], [7,8,9], [9,8,9], [7,8,9]]) # passing transpose of array as sorting key sorted_index = np.lexsort(arr.T) # creating sorted array using sorting index sorted_arr = arr[sorted_index,:] # unique row info unique_row = np.append( [True], np.any(np.diff(sorted_arr, axis=0),1)) arr=np.array(sorted_arr[unique_row]) print(arr)
OUTPUT:
[[1 2 3] [5 6 7] [7 8 9] [9 8 9]]
It removed all the duplicate rows from 2D NumPy Array
Summary
Great! you made it, We have discussed all possible methods of removing duplicates from the given numpy array, Now you have knowledge of how to deal with duplicates in 1D or 2D NumPy arrays. Keep learning, You can find amazing and interesting articles like this here.
Pandas Tutorials Learn Data Analysis with Python

Pandas Tutorial Part #1  Introduction to Data Analysis with Python

Pandas Tutorial Part #2  Basics of Pandas Series

Pandas Tutorial Part #3  Get & Set Series values

Pandas Tutorial Part #4  Attributes & methods of Pandas Series

Pandas Tutorial Part #5  Add or Remove Pandas Series elements

Pandas Tutorial Part #6  Introduction to DataFrame

Pandas Tutorial Part #7  DataFrame.loc[]  Select Rows / Columns by Indexing

Pandas Tutorial Part #8  DataFrame.iloc[]  Select Rows / Columns by Label Names

Pandas Tutorial Part #9  Filter DataFrame Rows

Pandas Tutorial Part #10  Add/Remove DataFrame Rows & Columns

Pandas Tutorial Part #11  DataFrame attributes & methods

Pandas Tutorial Part #12  Handling Missing Data or NaN values

Pandas Tutorial Part #13  Iterate over Rows & Columns of DataFrame

Pandas Tutorial Part #14  Sorting DataFrame by Rows or Columns

Pandas Tutorial Part #15  Merging or Concatenating DataFrames

Pandas Tutorial Part #16  DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most soughtafter professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.