Remove duplicates from NumPy Array in Python

In this article, we will learn how remove duplicate elements or rows or columns from a NumPy Array in Python.

Table Of Contents

Given a NumPy array, we need to remove the duplicates i.e the elements which are repeating more than once from the array. For example, if our input NumPy array is,

Input Array  : [1,2,3,4,4,5,6,7]  

Then after deleting duplicate elements from this NumPy Array, our content must be like,

Output Array : [1,2,3,4,5,6,7] 

There are multiple ways of removing duplicates from a NumPy Array. Let’s discuss all the methods one by one with proper approach and a working code example

Remove duplicates from NumPy Array using unique() method

The unique() method is a built-in method in the numpy, that takes an array as input and return a unique array i.e by removing all the duplicate elements. In order to remove duplicates we will pass the given NumPy array to the unique() method and it will return the unique array.

Syntax:

numpy.unique(arr, return_index=False, return_inverse=False, return_counts=False, axis=None) 

Parameters:
arr            = The array to be passed to the function.
return_index   = If True, returns the indices of unique array
return_inverse = If True, also returns the indices of unique array
axis           = Axis 0 represents rows and axis 1 represents columns, if no axis is provided then the input array will be flattened i.e treated as a 1d array

Delete duplicate elements from 1D NumPy Array

Approach :

  • Import numpy library and create a numpy array.
  • Pass the array to the unique() method without axis parameter.
  • The function will return the unique array.
  • print the resultant array.
import numpy as np

# Create a NumPy Aray
data = np.array([1,2,3,4,4,5,6,7])

# Pass array to the unique function
# It will remove the duplicates.
data = np.unique(data)

print(data)

Output:

[1 2 3 4 5 6 7]

It delete all the duplicate elements from the NumPy Array.

Delete duplicate rows from 2D NumPy Array

To remove the duplicate rows from a 2D NumPy array use the following steps,

  • Import numpy library and create a numpy array
  • Pass the array to the unique() method axis=0 parameter
  • The function will return the unique array
  • print the resultant array.

Source code

import numpy as np

# create numpy arrays
data = np.array([[1,2,3],
                 [3,2,1],
                 [7,8,9],
                 [9,8,9],
                 [7,8,9]])

# Delete duplicate rows from 2D NumPy Array
data = np.unique(data, axis=0)

print(data)

OUTPUT:

[[1 2 3]
 [3 2 1]
 [7 8 9]
 [9 8 9]]

It deleted all the duplicate rows from the 2d NumPy Array.

Delete duplicate columns from 2D NumPy Array

To remove the duplicate columns from a 2D NumPy array use the following steps,

  • Import numpy library and create a numpy array
  • Pass the array to the unique() method axis=1 parameter
  • The function will return the unique array

Source code

import numpy as np

# create numpy arrays
data = np.array([[1, 14, 3, 14, 14],
                 [3, 13, 1, 13, 13],
                 [7, 12, 9, 12, 12],
                 [9, 11, 9, 11, 11],
                 [7, 10, 9, 10, 10]])

# Remove Duplicate columns from 2D NumPy Array
data = np.unique(data, axis=1)

print(data)

Output:

[[ 1  3 14]
 [ 3  1 13]
 [ 7  9 12]
 [ 9  9 11]
 [ 7  9 10]]

Remove duplicates from NumPy Array using set() method

The set() method is a built-in method in python that takes an iterable as input and return a set iterable with distinct elements only.

Syntax:

set(iterable)

Parameters:
 Any iterable like tuple.
 It returns a iterable with unique elements

Let’s use this function to delete duplicate rows from 2D NumPy Array.

Approach :

  • import numpy library and create a numpy array
  • Iterate over each row of 2D array and get row contents as a tuple because numpy array is unhashable
  • Pass the hashable row tuples to the set()
  • set() method will return a iterator with unique elements/tuples
  • using numpy.vstack() we will join the array vertically.
  • Print the resultant array.

Source code

import numpy as np

# create numpy arrays
data = np.array([[1,2,3],
                 [3,2,1],
                 [7,8,9],
                 [9,8,9],
                 [7,8,9]])


# Delete duplicate rows from 2D NumPy Array
data = np.vstack(list(set(tuple(row) for row in data)))

print(data)

OUTPUT:

[[9 8 9]
 [7 8 9]
 [3 2 1]
 [1 2 3]]

Using unique() method along with the return_index parameter

Delete duplicate rows from 2D NumPy Array using unique() function

The unique() method is a built-in method in numpy that takes an array as input and return a unique array i.e by removing all the duplicate elements.

In this case we need to remove the duplicates of given array, So we create a random array with length as number of columns in the original array and we will multiply the random array with given array. The resultant array will be passed as input argument to the unique() method with the return_index parameter as True, so this method will return the index of the unique array. The index will give us an unique array.

Syntax:

numpy.unique(arr, return_index=False, return_inverse=False, return_counts=False, axis=None) 

Parameters:
arr            = The array to be passed to the function. 
return_index   = If True, returns the indices of unique array
return_inverse = If True, also returns the indices of unique array
axis           = Axis 0 represents rows and axis 1 represents columns, if no axis is provided then the input array will be flattened i.e treated as a 1d array

Approach :

  1. import numpy library and create a numpy array
  2. Create a random array with length as number of columns in the array
  3. multiply the random array and given array using np.dot() method i.e. dot product, in this case matrix multiplication
  4. Pass resultant array as input argument to the unique() method with the return_index parameter as True
  5. The method will return the index of the unique array.
  6. The index is used to print the unique array of the given array

Source code

import numpy as np

# create numpy arrays
data = np.array([[1,2,3],
                 [3,2,1],
                 [7,8,9],
                 [9,8,9],
                 [7,8,9]])


# creating a random array
a = np.random.rand(data.shape[1])

# multiply the given array and random array.
b = data.dot(a)

# pass the resultant array to the unique()
unique, index = np.unique(b, return_index=True)

# use the index to print the unique array from given array
data = data[index]

print(data)


OUTPUT:

[[3 2 1]
 [1 2 3]
 [7 8 9]
 [9 8 9]]

Removing duplicates from a 1D NumPy Array by iterating

Given an 1-d array, for each element in the array we will check if it is repeated in the array, if repeated we will remove the element else we will keep it.

Approach :

  1. import numpy library and create a numpy array
  2. Initialise an empty list and name it as unique.
  3. Iterate over the numpy array and for each element check if the element is present in unique list
  4. If the element is not present in the unique list then add it to the list, else continue.
  5. Now make a numpy array from the unique list

Source code

import numpy as np

# create a numpy array
data=np.array([1, 2, 3, 4, 4, 6, 5, 6, 7])

# creating a empty list
unique=[]

# iterating each element of array
for i in data:
# if element is not present in the list
# add the element to list.
    if i not in unique:
        unique.append(i)

data=np.array(unique)       

print(data)

OUTPUT:

[1 2 3 4 6 5 7]

Removing duplicates from a 2D array by iterating array

Given an 2-d array, for each array in the array we will check if it is repeated in the array, if repeated we will remove the array else we will keep it.

Approach :

  1. import numpy library and create a numpy array
  2. Initialise an empty list and name it as unique.
  3. Iterate over the numpy array and for each array check if the array is present in unique list
  4. If the array is not present in the unique list then add it to the list, else continue.
  5. Now make a numpy array from the unique list

Source code

import numpy as np

# create 2D NumPy Array
data=np.array([ [1,2,3],
                [5,6,7],
                [7,8,9],
                [9,8,9],
                [7,8,9]])

unique=[]

# iterating each array of array
for i in data:
    # if array is not present in the list
    # add the array to list.
    if list(i) not in unique:
        unique.append(list(i))

data=np.array(unique)       

print(data)

OUTPUT:

[[1 2 3]
 [5 6 7]
 [7 8 9]
 [9 8 9]]

Using numpy.lexsort() and np.diff() methods

lexsort()

The lexsort() is the indirect stable sort, the lexsort() takes an array of sorting keys, which can be interpreted as columns in a numpy array, lexsort returns an array of integer indices that describes the sort order by multiple columns.

Syntax:

numpy.lexsort(keys, axis)

Parameters:
 Keys : sorting keys
 axis : Axis to be indirectly sorted.

Returns:
  Array of indices that sort the keys along the specified axis.

numpy.diff()

The diff() method is used to Calculate the difference along the given axis.

Syntax:

numpy.diff(arr, n, axis)

Parameters:
 arr : [array_like] Input array. 
 n :  The number of times values are differenced. 
 axis : The axis along which the difference is taken.

Returns:
  differences along the axis and size will be smaller than length of actual array.

To remove the duplicates from the array , We will sort the given NumPy array using the lexsort() and after sorting if there are any duplicates then they will be adjacent. Now the sorted array is passed to diff() method which will find the differences along the array, if there are any duplicates the difference will be zero. We use the any() method to find the non-zero rows and this will be used to get unique array from the sorted array.

Approach :

  1. import numpy library and create a numpy array
  2. Pass the transpose of the given array as sorting keys to the lexsort() method
  3. The given array is sorted by using the sorting index returned by the lexsort method
  4. The sorted array is passes to the numpy diff() method , which will find the diffrences along the axis
  5. any() method to find the non-zero rows
  6. The Non-zero row info is used to make the unique array from the sorted array.

    NOTE : You can better understand this approach once you take a look at the Code.

Source code

import numpy as np

# create 2D NumPy Array
arr = np.array([[1,2,3],
                [5,6,7],
                [7,8,9],
                [9,8,9],
                [7,8,9]])

# passing transpose of array as sorting key
sorted_index = np.lexsort(arr.T)

# creating sorted array using sorting index
sorted_arr =  arr[sorted_index,:]

# unique row info
unique_row = np.append( [True], np.any(np.diff(sorted_arr, axis=0),1))

arr=np.array(sorted_arr[unique_row])

print(arr)

OUTPUT:

[[1 2 3]
 [5 6 7]
 [7 8 9]
 [9 8 9]]

It removed all the duplicate rows from 2D NumPy Array

Summary

Great! you made it, We have discussed all possible methods of removing duplicates from the given numpy array, Now you have knowledge of how to deal with duplicates in 1D or 2D NumPy arrays. Keep learning, You can find amazing and interesting articles like this here.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top