In this article we will discuss different ways to fetch the frequency or occurrence count of each character in the string and their index positions in the string using collections.Counter() and regex.
Get Frequency of each character in string using collections.Counter()
collections.counter(iterable-or-mapping)
Counter is a dict subclass and collections.Counter() accepts an iterable entity as argument and keeps the elements in it as keys and their frequency as values. So, if we pass a string in collections.Counter() then it will return a Counter class object which internally has all characters in string as keys and their frequency in string as values. So, let’s use that to find the frequency of all characters in a string i.e.
mainStr = 'This is a sample string and a sample code. It is a very short string. 001122' # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value frequency = Counter(mainStr) print("Occurrence Count of all characters :") # Iterate over the dictionary and Print the frequency of each character for (key, value) in frequency.items(): print("Occurrence Count of ", key, " is : ", value)
Output:
Occurrence Count of all characters : Occurrence Count of T is : 1 Occurrence Count of h is : 2 Occurrence Count of i is : 5 Occurrence Count of s is : 8 Occurrence Count of is : 15 Occurrence Count of a is : 6 Occurrence Count of m is : 2 Occurrence Count of p is : 2 Occurrence Count of l is : 2 Occurrence Count of e is : 4 Occurrence Count of t is : 4 Occurrence Count of r is : 4 Occurrence Count of n is : 3 Occurrence Count of g is : 2 Occurrence Count of d is : 2 Occurrence Count of c is : 1 Occurrence Count of o is : 2 Occurrence Count of . is : 2 Occurrence Count of I is : 1 Occurrence Count of v is : 1 Occurrence Count of y is : 1 Occurrence Count of 0 is : 2 Occurrence Count of 1 is : 2 Occurrence Count of 2 is : 2
This way we got the occurrence count of all the characters in the string including ‘ ‘ and ‘.’. What if want the frequency of only characters & numbers, also their index positions in a list. Let’s see how to do that,
Python Regex : Get frequency of each character in string
We will create a regex pattern to match all the alphanumeric characters in the string i.e.
# Create a Regex pattern to match alphanumeric characters regexPattern = re.compile('[a-zA-Z0-9]')
Now iterate over all the matches of above pattern in the string using pattern.finditer() and create dictionaries of frequency count of each character and their index positions in the string i.e.
Frequently Asked:
- Check if a string contains a number in Python
- Difference between Array and List in Python
- Read a text file into string and strip newlines in Python
- Check if String contains an element from List in Python
mainStr = 'This is a sample string and a sample code. It is a very short string. 001122' # Iterate over all the alphanumeric characters in string (that matches the regex pattern) # While Iterating keep on updating the frequency count of each character in a dictionary iteratorOfMatchObs = regexPattern.finditer(mainStr) frequencyOfChars = {} indexPositions = {} for matchObj in iteratorOfMatchObs: frequencyOfChars[matchObj.group()] = frequencyOfChars.get(matchObj.group(), 0) + 1 indexPositions[matchObj.group()] = indexPositions.get(matchObj.group(), []) + [matchObj.start()] # Iterate over the dictionary and Print the frequency of each character for (key, value) in frequencyOfChars.items(): print("Occurrence Count of ", key , " is : ", value , ' & Index Positions : ', indexPositions[key])
Output
Occurrence Count of T is : 1 & Index Positions : [0] Occurrence Count of h is : 2 & Index Positions : [1, 57] Occurrence Count of i is : 5 & Index Positions : [2, 5, 20, 46, 65] Occurrence Count of s is : 8 & Index Positions : [3, 6, 10, 17, 30, 47, 56, 62] Occurrence Count of a is : 6 & Index Positions : [8, 11, 24, 28, 31, 49] Occurrence Count of m is : 2 & Index Positions : [12, 32] Occurrence Count of p is : 2 & Index Positions : [13, 33] Occurrence Count of l is : 2 & Index Positions : [14, 34] Occurrence Count of e is : 4 & Index Positions : [15, 35, 40, 52] Occurrence Count of t is : 4 & Index Positions : [18, 44, 60, 63] Occurrence Count of r is : 4 & Index Positions : [19, 53, 59, 64] Occurrence Count of n is : 3 & Index Positions : [21, 25, 66] Occurrence Count of g is : 2 & Index Positions : [22, 67] Occurrence Count of d is : 2 & Index Positions : [26, 39] Occurrence Count of c is : 1 & Index Positions : [37] Occurrence Count of o is : 2 & Index Positions : [38, 58] Occurrence Count of I is : 1 & Index Positions : [43] Occurrence Count of v is : 1 & Index Positions : [51] Occurrence Count of y is : 1 & Index Positions : [54] Occurrence Count of 0 is : 2 & Index Positions : [70, 71] Occurrence Count of 1 is : 2 & Index Positions : [72, 73] Occurrence Count of 2 is : 2 & Index Positions : [74, 75]
Find Duplicate characters in a String using collections.Counter()
Suppose we have a string i.e.
mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'
Now to find all the duplicate characters in this string, use collections.Counter() to find the frequency of each character in string and characters which has frequency more than 2 are duplicate ones i.e.
listOfDupChars = [] # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value frequency = Counter(mainStr) # Iterate over the dictionary and Print the frequency of each character for (key, value) in frequency.items(): if value > 2: listOfDupChars.append(key) print('Duplicate characters ; ', listOfDupChars)
Output:
Duplicate characters ; ['i', 's', ' ', 'a', 'e', 't', 'r', 'n']
Complete example is as follows,
from collections import Counter import re def main(): print('**** Get Frequency each character in String using collections.Counter()****') mainStr = 'This is a sample string and a sample code. It is a very short string. 001122' # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value frequency = Counter(mainStr) print("Occurrence Count of all characters :") # Iterate over the dictionary and Print the frequency of each character for (key, value) in frequency.items(): print("Occurrence Count of ", key, " is : ", value) print('**** Get frequency of each character in String using Regex****') mainStr = 'This is a sample string and a sample code. It is a very short string. 001122' # Create a Regex pattern to match alphanumeric characters regexPattern = re.compile('[a-zA-Z0-9]') # Iterate over all the alphanumeric characters in string (that matches the regex pattern) # While Iterating keep on updating the frequency count of each character in a dictionary iteratorOfMatchObs = regexPattern.finditer(mainStr) frequencyOfChars = {} indexPositions = {} for matchObj in iteratorOfMatchObs: frequencyOfChars[matchObj.group()] = frequencyOfChars.get(matchObj.group(), 0) + 1 indexPositions[matchObj.group()] = indexPositions.get(matchObj.group(), []) + [matchObj.start()] # Iterate over the dictionary and Print the frequency of each character for (key, value) in frequencyOfChars.items(): print("Occurrence Count of ", key , " is : ", value , ' & Index Positions : ', indexPositions[key]) print('**** Find Duplicate characters in a String using collections.Counter()****') mainStr = 'This is a sample string and a sample code. It is a very short string. 001122' listOfDupChars = [] # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value frequency = Counter(mainStr) # Iterate over the dictionary and Print the frequency of each character for (key, value) in frequency.items(): if value > 2: listOfDupChars.append(key) print('Duplicate characters ; ', listOfDupChars) if __name__ == '__main__': main()
Output
**** Get Frequency each character in String using collections.Counter()**** Occurrence Count of all characters : Occurrence Count of T is : 1 Occurrence Count of h is : 2 Occurrence Count of i is : 5 Occurrence Count of s is : 8 Occurrence Count of is : 15 Occurrence Count of a is : 6 Occurrence Count of m is : 2 Occurrence Count of p is : 2 Occurrence Count of l is : 2 Occurrence Count of e is : 4 Occurrence Count of t is : 4 Occurrence Count of r is : 4 Occurrence Count of n is : 3 Occurrence Count of g is : 2 Occurrence Count of d is : 2 Occurrence Count of c is : 1 Occurrence Count of o is : 2 Occurrence Count of . is : 2 Occurrence Count of I is : 1 Occurrence Count of v is : 1 Occurrence Count of y is : 1 Occurrence Count of 0 is : 2 Occurrence Count of 1 is : 2 Occurrence Count of 2 is : 2 **** Get frequency of each character in String using Regex**** Occurrence Count of T is : 1 & Index Positions : [0] Occurrence Count of h is : 2 & Index Positions : [1, 57] Occurrence Count of i is : 5 & Index Positions : [2, 5, 20, 46, 65] Occurrence Count of s is : 8 & Index Positions : [3, 6, 10, 17, 30, 47, 56, 62] Occurrence Count of a is : 6 & Index Positions : [8, 11, 24, 28, 31, 49] Occurrence Count of m is : 2 & Index Positions : [12, 32] Occurrence Count of p is : 2 & Index Positions : [13, 33] Occurrence Count of l is : 2 & Index Positions : [14, 34] Occurrence Count of e is : 4 & Index Positions : [15, 35, 40, 52] Occurrence Count of t is : 4 & Index Positions : [18, 44, 60, 63] Occurrence Count of r is : 4 & Index Positions : [19, 53, 59, 64] Occurrence Count of n is : 3 & Index Positions : [21, 25, 66] Occurrence Count of g is : 2 & Index Positions : [22, 67] Occurrence Count of d is : 2 & Index Positions : [26, 39] Occurrence Count of c is : 1 & Index Positions : [37] Occurrence Count of o is : 2 & Index Positions : [38, 58] Occurrence Count of I is : 1 & Index Positions : [43] Occurrence Count of v is : 1 & Index Positions : [51] Occurrence Count of y is : 1 & Index Positions : [54] Occurrence Count of 0 is : 2 & Index Positions : [70, 71] Occurrence Count of 1 is : 2 & Index Positions : [72, 73] Occurrence Count of 2 is : 2 & Index Positions : [74, 75] **** Find Duplicate characters in a String using collections.Counter()**** Duplicate characters ; ['i', 's', ' ', 'a', 'e', 't', 'r', 'n']