Find frequency of each character in string and their indices | Finding duplicate characters in a string

In this article we will discuss different ways to fetch the frequency or occurrence count of each character in the string and their index positions in the string using collections.Counter() and regex.

Get Frequency of each character in string using collections.Counter()

collections.counter(iterable-or-mapping)

Counter is a dict subclass and collections.Counter() accepts an iterable entity as argument and keeps the elements in it as keys and their frequency as values. So, if we pass a string in collections.Counter() then it will return a Counter class object which internally has all characters in string as keys and their frequency in string as values. So, let’s use that to find the frequency of all characters in a string i.e.

mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'

# Counter is a dict sub class that keeps the characters in string as keys and their frequency as value
frequency = Counter(mainStr)

print("Occurrence Count of all characters :")
# Iterate over the dictionary and Print the frequency of each character
for (key, value) in frequency.items():
   print("Occurrence Count of ", key, " is : ", value)

Output:

Occurrence Count of all characters :
Occurrence Count of  T  is :  1
Occurrence Count of  h  is :  2
Occurrence Count of  i  is :  5
Occurrence Count of  s  is :  8
Occurrence Count of     is :  15
Occurrence Count of  a  is :  6
Occurrence Count of  m  is :  2
Occurrence Count of  p  is :  2
Occurrence Count of  l  is :  2
Occurrence Count of  e  is :  4
Occurrence Count of  t  is :  4
Occurrence Count of  r  is :  4
Occurrence Count of  n  is :  3
Occurrence Count of  g  is :  2
Occurrence Count of  d  is :  2
Occurrence Count of  c  is :  1
Occurrence Count of  o  is :  2
Occurrence Count of  .  is :  2
Occurrence Count of  I  is :  1
Occurrence Count of  v  is :  1
Occurrence Count of  y  is :  1
Occurrence Count of  0  is :  2
Occurrence Count of  1  is :  2
Occurrence Count of  2  is :  2

This way we got the occurrence count of all the characters in the string including ‘ ‘ and ‘.’. What if want the frequency of only characters & numbers, also their index positions in a list. Let’s see how to do that,

Python Regex : Get frequency of each character in string

We will create a regex pattern to match all the alphanumeric characters in the string i.e.

# Create a Regex pattern to match alphanumeric characters
regexPattern = re.compile('[a-zA-Z0-9]')

Now iterate over all the matches of above pattern in the string using pattern.finditer() and create dictionaries of frequency count of each character and their index positions in the string i.e.

mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'

# Iterate over all the alphanumeric characters in string (that matches the regex pattern)
# While Iterating keep on updating the frequency count of each character in a dictionary
iteratorOfMatchObs = regexPattern.finditer(mainStr)
frequencyOfChars = {}
indexPositions = {}

for matchObj in iteratorOfMatchObs:
   frequencyOfChars[matchObj.group()] = frequencyOfChars.get(matchObj.group(), 0) + 1
   indexPositions[matchObj.group()] = indexPositions.get(matchObj.group(), []) + [matchObj.start()]

# Iterate over the dictionary and Print the frequency of each character
for (key, value) in frequencyOfChars.items():
   print("Occurrence Count of ", key , " is : ", value , ' & Index Positions : ', indexPositions[key])

Output

Occurrence Count of  T  is :  1  & Index Positions :  [0]
Occurrence Count of  h  is :  2  & Index Positions :  [1, 57]
Occurrence Count of  i  is :  5  & Index Positions :  [2, 5, 20, 46, 65]
Occurrence Count of  s  is :  8  & Index Positions :  [3, 6, 10, 17, 30, 47, 56, 62]
Occurrence Count of  a  is :  6  & Index Positions :  [8, 11, 24, 28, 31, 49]
Occurrence Count of  m  is :  2  & Index Positions :  [12, 32]
Occurrence Count of  p  is :  2  & Index Positions :  [13, 33]
Occurrence Count of  l  is :  2  & Index Positions :  [14, 34]
Occurrence Count of  e  is :  4  & Index Positions :  [15, 35, 40, 52]
Occurrence Count of  t  is :  4  & Index Positions :  [18, 44, 60, 63]
Occurrence Count of  r  is :  4  & Index Positions :  [19, 53, 59, 64]
Occurrence Count of  n  is :  3  & Index Positions :  [21, 25, 66]
Occurrence Count of  g  is :  2  & Index Positions :  [22, 67]
Occurrence Count of  d  is :  2  & Index Positions :  [26, 39]
Occurrence Count of  c  is :  1  & Index Positions :  [37]
Occurrence Count of  o  is :  2  & Index Positions :  [38, 58]
Occurrence Count of  I  is :  1  & Index Positions :  [43]
Occurrence Count of  v  is :  1  & Index Positions :  [51]
Occurrence Count of  y  is :  1  & Index Positions :  [54]
Occurrence Count of  0  is :  2  & Index Positions :  [70, 71]
Occurrence Count of  1  is :  2  & Index Positions :  [72, 73]
Occurrence Count of  2  is :  2  & Index Positions :  [74, 75]

Find Duplicate characters in a String using collections.Counter()

Suppose we have a string i.e.

mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'

Now to find all the duplicate characters in this string, use collections.Counter() to find the frequency of each character in string and characters which has frequency more than 2 are duplicate ones i.e.

listOfDupChars = []
# Counter is a dict sub class that keeps the characters in string as keys and their frequency as value
frequency = Counter(mainStr)

# Iterate over the dictionary and Print the frequency of each character
for (key, value) in frequency.items():
   if value > 2:
       listOfDupChars.append(key)
print('Duplicate characters ; ', listOfDupChars)

Output:

Duplicate characters ;  ['i', 's', ' ', 'a', 'e', 't', 'r', 'n']

Complete example is as follows,

from collections import Counter
import re


def main():

    print('**** Get Frequency each character in String using collections.Counter()****')

    mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'

    # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value
    frequency = Counter(mainStr)

    print("Occurrence Count of all characters :")
    # Iterate over the dictionary and Print the frequency of each character
    for (key, value) in frequency.items():
       print("Occurrence Count of ", key, " is : ", value)

    print('**** Get frequency of each character in String using Regex****')

    mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'

    # Create a Regex pattern to match alphanumeric characters
    regexPattern = re.compile('[a-zA-Z0-9]')

    # Iterate over all the alphanumeric characters in string (that matches the regex pattern)
    # While Iterating keep on updating the frequency count of each character in a dictionary
    iteratorOfMatchObs = regexPattern.finditer(mainStr)
    frequencyOfChars = {}
    indexPositions = {}

    for matchObj in iteratorOfMatchObs:
       frequencyOfChars[matchObj.group()] = frequencyOfChars.get(matchObj.group(), 0) + 1
       indexPositions[matchObj.group()] = indexPositions.get(matchObj.group(), []) + [matchObj.start()]

    # Iterate over the dictionary and Print the frequency of each character
    for (key, value) in frequencyOfChars.items():
       print("Occurrence Count of ", key , " is : ", value , ' & Index Positions : ', indexPositions[key])


    print('**** Find Duplicate characters in a String using collections.Counter()****')

    mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'

    listOfDupChars = []
    # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value
    frequency = Counter(mainStr)

    # Iterate over the dictionary and Print the frequency of each character
    for (key, value) in frequency.items():
       if value > 2:
           listOfDupChars.append(key)
    print('Duplicate characters ; ', listOfDupChars)


if __name__ == '__main__':
  main()

Output

**** Get Frequency each character in String using collections.Counter()****
Occurrence Count of all characters :
Occurrence Count of  T  is :  1
Occurrence Count of  h  is :  2
Occurrence Count of  i  is :  5
Occurrence Count of  s  is :  8
Occurrence Count of     is :  15
Occurrence Count of  a  is :  6
Occurrence Count of  m  is :  2
Occurrence Count of  p  is :  2
Occurrence Count of  l  is :  2
Occurrence Count of  e  is :  4
Occurrence Count of  t  is :  4
Occurrence Count of  r  is :  4
Occurrence Count of  n  is :  3
Occurrence Count of  g  is :  2
Occurrence Count of  d  is :  2
Occurrence Count of  c  is :  1
Occurrence Count of  o  is :  2
Occurrence Count of  .  is :  2
Occurrence Count of  I  is :  1
Occurrence Count of  v  is :  1
Occurrence Count of  y  is :  1
Occurrence Count of  0  is :  2
Occurrence Count of  1  is :  2
Occurrence Count of  2  is :  2
**** Get frequency of each character in String using Regex****
Occurrence Count of  T  is :  1  & Index Positions :  [0]
Occurrence Count of  h  is :  2  & Index Positions :  [1, 57]
Occurrence Count of  i  is :  5  & Index Positions :  [2, 5, 20, 46, 65]
Occurrence Count of  s  is :  8  & Index Positions :  [3, 6, 10, 17, 30, 47, 56, 62]
Occurrence Count of  a  is :  6  & Index Positions :  [8, 11, 24, 28, 31, 49]
Occurrence Count of  m  is :  2  & Index Positions :  [12, 32]
Occurrence Count of  p  is :  2  & Index Positions :  [13, 33]
Occurrence Count of  l  is :  2  & Index Positions :  [14, 34]
Occurrence Count of  e  is :  4  & Index Positions :  [15, 35, 40, 52]
Occurrence Count of  t  is :  4  & Index Positions :  [18, 44, 60, 63]
Occurrence Count of  r  is :  4  & Index Positions :  [19, 53, 59, 64]
Occurrence Count of  n  is :  3  & Index Positions :  [21, 25, 66]
Occurrence Count of  g  is :  2  & Index Positions :  [22, 67]
Occurrence Count of  d  is :  2  & Index Positions :  [26, 39]
Occurrence Count of  c  is :  1  & Index Positions :  [37]
Occurrence Count of  o  is :  2  & Index Positions :  [38, 58]
Occurrence Count of  I  is :  1  & Index Positions :  [43]
Occurrence Count of  v  is :  1  & Index Positions :  [51]
Occurrence Count of  y  is :  1  & Index Positions :  [54]
Occurrence Count of  0  is :  2  & Index Positions :  [70, 71]
Occurrence Count of  1  is :  2  & Index Positions :  [72, 73]
Occurrence Count of  2  is :  2  & Index Positions :  [74, 75]
**** Find Duplicate characters in a String using collections.Counter()****
Duplicate characters ;  ['i', 's', ' ', 'a', 'e', 't', 'r', 'n']

 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top