Python : Find occurrence count & all indices of a sub-string in another string | including overlapping sub-strings

In this article we will discuss different ways to count occurrences of a sub-string in another string and also their index positions.

Count occurrences of a sub-string in another string using string.count()

Python’s String class contains a method to count the non overlapping occurrences of a sub-string in the string object i.e.

string.count(s, sub[, start[, end]])

It looks for the sub-string s in range start to end and returns it’s occurrence count. If start & end is not provided then it will look in complete string and returns the occurrence count of sub-string  in the string. For example,

mainStr = 'This is a sample string and a sample code. It is very short.'

# Get the occurrence count of sub-string in main string.
count = mainStr.count('sample')

print("'sample' sub string frequency / occurrence count : " , count)

Output:

'sample' sub string frequency / occurrence count :  2

As ‘sample’ string exists at 2 places in the another string, so it returned 2.

Using Python Regex : Count occurrences of a sub string in string

We can easily get the occurrence count using python regex too. For that we will create a regex pattern with sub-string and then find all matches of that regex pattern in another string i.e.

# Create a Regex pattern to match the substring
regexPattern = re.compile("sample")

# Get a list of strings that matches the given pattern i.e. substring
listOfMatches = regexPattern.findall(mainStr)

print("'sample' sub string frequency / occurrence count : ", len(listOfMatches))

As ‘sample’ string exists at 2 places in the another string, so regex pattern is matched at 2 places and a list of those matches is returned. Length of the list returned will tell the total occurrence count of sub-string in main string.

'sample' sub string frequency / occurrence count :  2

Count Overlapping occurrences of a sub-string in another string

The ways we have seen till now are not able to count the overlapping sub-strings. Let’s understand by example,

Suppose we have a string which has overlapping occurrence of sub-string ‘that’ i.e.,

mainStr = 'thathatthat'

Now if we count the occurrence of a sub-string ‘that’ in this string using string.count(),

# string.count() will not be able to count occurrences of overlapping sub-strings
count = mainStr.count('that')

string.count() will return 2, where as there are 3 overlapping occurrence of ‘that’ in main string.

As, string.count() can not find the overlapping occurrences of a sub-string.  So, let’s create a function to do this,

''''
Find occurrence count of overlapping substrings.
Start from left and start searching for the substring when found increment the counter
and keep on search from next index position. 
'''
def frequencyCount(mainStr, subStr):
   counter = pos = 0
   while(True):
       pos = mainStr.find(subStr , pos)
       if pos > -1:
           counter = counter + 1
           pos = pos + 1
       else:
           break
   return counter

Now let’s use this function of find occurrence count of a overlapping sub-string ‘that’ in the main string,

# count occurrences of overlapping substrings
count = frequencyCount(mainStr, 'that')

print("'that' sub string frequency count : ", count)

Output:

'that' sub string frequency count :  3

Find occurrence count and index positions of a sub-string in another string

Find indices of non-overlapping sub-string in string using Python regex finditer()

Using Regex find all the matches of a sub-string in another main string and iterate over all those matches to find their index positions i.e.

# Create a Regex pattern to match the substring
regexPattern = re.compile('sample')

# Iterate over all the matches of substring using iterator of matchObjects returnes by finditer()
iteratorOfMatchObs = regexPattern.finditer(mainStr)
indexPositions = []
count = 0
for matchObj in iteratorOfMatchObs:
   indexPositions.append(matchObj.start())
   count = count + 1

print("Occurrence Count of substring 'sample' : ", count)
print("Index Positions of 'sample' are : ", indexPositions)

Output:

Occurrence Count of substring 'sample' :  2
Index Positions of 'sample' are :  [10, 30]

It returns the count & indices of non-overlapping sub-strings only. To find the occurrence count & indices of overlapping sub-strings let’s modify the above create function

Find indices of overlapping sub-string in string using Python

''''
Find occurrence count of overlapping substrings and get their count and index positions.
Start from left and start searching for the substring when found increment the counter
and keep on search from next index position. 
'''
def frequencyCountAndPositions(mainStr, subStr):
   counter = pos = 0
   indexpos = []
   while(True):
       pos = mainStr.find(subStr , pos)
       if pos > -1:
           indexpos.append(pos)
           counter = counter + 1
           pos = pos + 1
       else:
           break
   return (counter, indexpos)

Let’s use this function to find indices of overlapping sub-strings in main string,

mainStr = 'thathatthat'

result = frequencyCountAndPositions(mainStr, 'that')

print("Occurrence Count of overlapping sub-strings 'that' : ", result[0])
print("Index Positions of 'that' are : ", result[1])

Output:

Occurrence Count of overlapping sub-strings 'that' :  3
Index Positions of 'that' are :  [0, 3, 7]

Find nth occurrence of a sub-string in another string

Let’s use the same function frequencyCountAndPositions()  to find the nth occurrence of a sub-string in another string i.e.

mainStr = 'This is a sample string and a sample code. It is very Short.'

result = frequencyCountAndPositions(mainStr, 'is')
if result[0] >= 2:
   print("Index Positions of 2nd Occurrence of sub-string 'is'  : ", result[1][1])

Output:

Index Positions of 2nd Occurrence of sub-string 'is'  :  5

Complete example is as follows,

import re

''''
Find occurrence count of overlapping substrings.
Start from left and start searching for the substring when found increment the counter
and keep on search from next index position. 
'''
def frequencyCount(mainStr, subStr):
   counter = pos = 0
   while(True):
       pos = mainStr.find(subStr , pos)
       if pos > -1:
           counter = counter + 1
           pos = pos + 1
       else:
           break
   return counter

''''
Find occurrence count of overlapping substrings and get their count and index positions.
Start from left and start searching for the substring when found increment the counter
and keep on search from next index position. 
'''
def frequencyCountAndPositions(mainStr, subStr):
   counter = pos = 0
   indexpos = []
   while(True):
       pos = mainStr.find(subStr , pos)
       if pos > -1:
           indexpos.append(pos)
           counter = counter + 1
           pos = pos + 1
       else:
           break
   return (counter, indexpos)



def main():

    print(' **** Get occurrence count of a sub string in string using string.count() ****')

    mainStr = 'This is a sample string and a sample code. It is very short.'

    # Get the occurrence count of sub-string in main string.
    count = mainStr.count('sample')

    print("'sample' sub string frequency / occurrence count : " , count)

    print(' **** Get occurrence count of a sub string in string using Python Regex ****')

    # Create a Regex pattern to match the substring
    regexPattern = re.compile("sample")

    # Get a list of strings that matches the given pattern i.e. substring
    listOfMatches = regexPattern.findall(mainStr)

    print("'sample' sub string frequency / occurrence count : ", len(listOfMatches))

    print(' **** Count overlapping sub-strings in the main string ****')

    mainStr = 'thathatthat'

    # string.count() will not be able to count occurrences of overlapping substrings
    count = mainStr.count('that')
    print("'that' sub string frequency count : ", count)

    # count occurrences of overlapping substrings
    count = frequencyCount(mainStr, 'that')

    print("'that' sub string frequency count : ", count)

    print('**** Find Occurrence count and all index position of a sub-string in a String **** ')

    mainStr = 'This is a sample string and a sample code. It is very Short.'

    # Create a Regex pattern to match the substring
    regexPattern = re.compile('sample')

    # Iterate over all the matches of substring using iterator of matchObjects returnes by finditer()
    iteratorOfMatchObs = regexPattern.finditer(mainStr)
    indexPositions = []
    count = 0
    for matchObj in iteratorOfMatchObs:
       indexPositions.append(matchObj.start())
       count = count + 1

    print("Occurrence Count of substring 'sample' : ", count)
    print("Index Positions of 'sample' are : ", indexPositions)

    mainStr = 'thathatthat'

    result = frequencyCountAndPositions(mainStr, 'that')
    print("Occurrence Count of sub string 'that' : ", result[0])
    print("Index Positions of 'that' are : ", result[1])

    print('*** Find the nth occurrence of sub-string in a string ****')

    mainStr = 'This is a sample string and a sample code. It is very Short.'

    result = frequencyCountAndPositions(mainStr, 'is')
    if result[0] >= 2:
       print("Index Positions of 2nd Occurrence of sub-string 'is'  : ", result[1][1])


if __name__ == '__main__':
  main()

Output:

 **** Get occurrence count of a sub string in string using string.count() ****
'sample' sub string frequency / occurrence count :  2
 **** Get occurrence count of a sub string in string using Python Regex ****
'sample' sub string frequency / occurrence count :  2
 **** Count overlapping sub-strings in the main string ****
'that' sub string frequency count :  2
'that' sub string frequency count :  3
**** Find Occurrence count and all index position of a sub-string in a String **** 
Occurrence Count of sub-string 'sample' :  2
Index Positions of 'sample' are :  [10, 30]
Occurrence Count of sub string 'that' :  3
Index Positions of 'that' are :  [0, 3, 7]
*** Find the nth occurrence of sub-string in a string ****
Index Positions of 2nd Occurrence of sub-string 'is'  :  5

 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top