Python: Find duplicates in a list with frequency count & index positions

In this article, we will discuss how to find duplicates in a list along with their frequency count and their index positions in the list.

Let’s do this step by step,

Step 1: Get duplicate elements in a list with a frequency count

Suppose we have a list of strings i.e.

# List of strings
listOfElems = ['Hello', 'Ok', 'is', 'Ok', 'test', 'this', 'is', 'a', 'test']

We have created a function that accepts a list and returns a dictionary of duplicate elements in that list along with their frequency count,

def getDuplicatesWithCount(listOfElems):
    ''' Get frequency count of duplicate elements in the given list '''
    dictOfElems = dict()
    # Iterate over each element in list
    for elem in listOfElems:
        # If element exists in dict then increment its value else add it in dict
        if elem in dictOfElems:
            dictOfElems[elem] += 1
        else:
            dictOfElems[elem] = 1    
 
    # Filter key-value pairs in dictionary. Keep pairs whose value is greater than 1 i.e. only duplicate elements from list.
    dictOfElems = { key:value for key, value in dictOfElems.items() if value > 1}
    # Returns a dict of duplicate elements and thier frequency count
    return dictOfElems

Let’s call this function to find out the duplicate elements in list with their frequency,

# List of strings
listOfElems = ['Hello', 'Ok', 'is', 'Ok', 'test', 'this', 'is', 'a', 'test']

# Get a dictionary containing duplicate elements in list and their frequency count
dictOfElems = getDuplicatesWithCount(listOfElems)     

for key, value in dictOfElems.items():
        print(key , ' :: ', value)

Output

Ok  ::  2
is  ::  2
test  ::  2

What this function is doing?

When called, this function creates a new dictionary. Then iterates over all the elements in the given list one by one. For each elements it checks if the element exists in the dictionary keys or not,

  • If element does not exist in dictionary keys, then it adds the element as key in the dictionary with value as 1.
  • If the element exists in dictionary keys, then increments the value of that key by 1.

Once the iteration of list elements ends, in this dictionary we have the frequency count of each element in the list. But as we are interested in duplicates only i.e. elements with frequency count more than 1. So, it removes the elements from this dictionary whose value is greater than 1. In the end, it returns a dictionary containing duplicate elements as keys and their frequency count as value.

We can achieve the same using collections.Counter() too,

Use collections.Counter() Find duplicates in a list with frequency count

class collections.Counter([iterable-or-mapping])

We can create an object of Counter class, using an iterable or any dict like mapping. This Counter object keeps the count of each element in iterable. Let’s use this Counter object to find duplicates in a list and their count,

# List of strings
listOfElems = ['Hello', 'Ok', 'is', 'Ok', 'test', 'this', 'is', 'a', 'test']

# Create a dictionary of elements & their frequency count
dictOfElems = dict(Counter(listOfElems))

# Remove elements from dictionary whose value is 1, i.e. non duplicate items
dictOfElems = { key:value for key, value in dictOfElems.items() if value > 1}

for key, value in dictOfElems.items():
        print('Element = ' , key , ' :: Repeated Count = ', value)  

Output:

Element =  Ok  :: Repeated Count =  2
Element =  is  :: Repeated Count =  2
Element =  test  :: Repeated Count =  2

Now we know the frequency count of each duplicate element in the list. But what if we want to know the index position of these duplicate elements in the list? Let’s see how to do that,

Step 2: Get indices of each duplicate element in a list along with frequency count

Suppose we have a list,

# List of strings
listOfElems = ['Hello', 'Ok', 'is', 'Ok', 'test', 'this', 'is', 'a', 'test']

Now we want to know indices of each duplicate element in list and also their frequency count. Something like this,

Element =  Ok  :: Repeated Count =  2  :: Index Positions =   [1, 3]
Element =  is  :: Repeated Count =  2  :: Index Positions =   [2, 6]
Element =  test  :: Repeated Count =  2  :: Index Positions =   [4, 8]

So, to achieve that we have created a function,

def getDuplicatesWithInfo(listOfElems):
    ''' Get duplicate element in a list along with thier indices in list
     and frequency count'''
    dictOfElems = dict()
    index = 0
    # Iterate over each element in list and keep track of index
    for elem in listOfElems:
        # If element exists in dict then keep its index in lisr & increment its frequency
        if elem in dictOfElems:
            dictOfElems[elem][0] += 1
            dictOfElems[elem][1].append(index)
        else:
            # Add a new entry in dictionary 
            dictOfElems[elem] = [1, [index]]
        index += 1    
 
    dictOfElems = { key:value for key, value in dictOfElems.items() if value[0] > 1}
    return dictOfElems

This function accepts a list of items and then iterates over the items in the list one by one to build a dictionary. In this dictionary, the key will be the element but value will be a list of,

  • Frequency Count
  • List of index positions of elements similar to the given element.

Let’s call this function to find out the duplicate elements in a list, their index positions, and their frequency,

# List of strings
listOfElems = ['Hello', 'Ok', 'is', 'Ok', 'test', 'this', 'is', 'a', 'test']

dictOfElems = getDuplicatesWithInfo(listOfElems)

for key, value in dictOfElems.items():
        print('Element = ', key , ' :: Repeated Count = ', value[0] , ' :: Index Positions =  ', value[1])    

Output

Element =  Ok  :: Repeated Count =  2  :: Index Positions =   [1, 3]
Element =  is  :: Repeated Count =  2  :: Index Positions =   [2, 6]
Element =  test  :: Repeated Count =  2  :: Index Positions =   [4, 8]

What this function is doing?

When we call this function with a list argument, then this function does following steps,

  • First of all, it creates a new dictionary.
  • Then iterates over all the elements in list one by one and keeps the track of index positions.
  • Then for each element, it checks if the element exists in the dictionary keys or not,
    • If element does not exist in dictionary keys then it adds a new key-value pair in dictionary, where the key is the element and value is a list object of 2 items i.e.
      • Frequency count 1
      • List with current index position
    • If the element exists in dictionary keys then it increments the frequency count in the value field and adds the index position in the index list.
  • Once the iteration of list elements finishes, in this dictionary, we have the frequency count of each element in the list, along with index positions.
  • But as we are interested in duplicates only i.e. elements with frequency count more than 1. So, it removes the elements from this dictionary whose value is greater than 1.
  • In the end, it returns a dictionary containing duplicate elements as keys, whereas in value field their frequency count and index positions of duplicate entries.

The complete example is as follows,

from collections import Counter
 
def getDuplicatesWithCount(listOfElems):
    ''' Get frequency count of duplicate elements in the given list '''
    dictOfElems = dict()
    # Iterate over each element in list
    for elem in listOfElems:
        # If element exists in dict then increment its value else add it in dict
        if elem in dictOfElems:
            dictOfElems[elem] += 1
        else:
            dictOfElems[elem] = 1    
 
    # Filter key-value pairs in dictionary. Keep pairs whose value is greater than 1 i.e. only duplicate elements from list.
    dictOfElems = { key:value for key, value in dictOfElems.items() if value > 1}
    # Returns a dict of duplicate elements and thier frequency count
    return dictOfElems
 
def getDuplicatesWithInfo(listOfElems):
    ''' Get duplicate element in a list along with thier indices in list
     and frequency count'''
    dictOfElems = dict()
    index = 0
    # Iterate over each element in list and keep track of index
    for elem in listOfElems:
        # If element exists in dict then keep its index in lisr & increment its frequency
        if elem in dictOfElems:
            dictOfElems[elem][0] += 1
            dictOfElems[elem][1].append(index)
        else:
            # Add a new entry in dictionary 
            dictOfElems[elem] = [1, [index]]
        index += 1    
 
    dictOfElems = { key:value for key, value in dictOfElems.items() if value[0] > 1}
    return dictOfElems
 
def main():
 
    # List of strings
    listOfElems = ['Hello', 'Ok', 'is', 'Ok', 'test', 'this', 'is', 'a', 'test']

    print('**** Get duplicate elements with repeated count ****')

    # get a dictionary containing duplicate elements in list and thier frequency count
    dictOfElems = getDuplicatesWithCount(listOfElems)     

    for key, value in dictOfElems.items():
            print(key , ' :: ', value)
 
    print('** Use Counter to get the frequency of duplicate items in list **')
    
    # Create a dictionary of elements & their frequency count
    dictOfElems = dict(Counter(listOfElems))

    # Remove elements from dictionary whose value is 1, i.e. non duplicate items
    dictOfElems = { key:value for key, value in dictOfElems.items() if value > 1}

    for key, value in dictOfElems.items():
            print('Element = ' , key , ' :: Repeated Count = ', value)  
 
    print('Get duplicate elements with repeated count and index position of duplicates')
 
    dictOfElems = getDuplicatesWithInfo(listOfElems)

    for key, value in dictOfElems.items():
            print('Element = ', key , ' :: Repeated Count = ', value[0] , ' :: Index Positions =  ', value[1])    
 
if __name__ == '__main__':
    main()

Output:

**** Get duplicate elements with repeated count ****
Ok  ::  2
is  ::  2
test  ::  2
** Use Counter to get the frequency of duplicate items in list **
Element =  Ok  :: Repeated Count =  2
Element =  is  :: Repeated Count =  2
Element =  test  :: Repeated Count =  2
Get duplicate elements with repeated count and index position of duplicates
Element =  Ok  :: Repeated Count =  2  :: Index Positions =   [1, 3]
Element =  is  :: Repeated Count =  2  :: Index Positions =   [2, 6]
Element =  test  :: Repeated Count =  2  :: Index Positions =   [4, 8]

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top