Python: Find indexes of an element in pandas dataframe

In this article, we will discuss how to find index positions of a given value in the dataframe i.e. row & column numbers.

Let’s create a dataframe i.e.

# List of Tuples
empoyees = [('jack', 34, 'Sydney', 155),
            ('Riti', 31, 'Delhi', 177),
            ('Aadi', 16, 'Mumbai', 81),
            ('Mohit', 31, 'Delhi', 167),
            ('Veena', 81, 'Delhi', 144),
            ('Shaunak', 35, 'Mumbai', 135),
            ('Shaun', 35, 'Colombo', 111)
            ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks'])

print(empDfObj)

Content of the dataframe is,

      Name  Age     City  Marks
0     jack   34   Sydney    155
1     Riti   31    Delhi    177
2     Aadi   16   Mumbai     81
3    Mohit   31    Delhi    167
4    Veena   81    Delhi    144
5  Shaunak   35   Mumbai    135
6    Shaun   35  Colombo    111

Now we want to find the exact locations in the dataframe where value ’81’ exists. Like,

(4, 'Age')
(2, 'Marks')

It shows that value 81 exists at 2 places in the above dataframe i.e.

  1. At row index 4 & column “Age”
  2. At row index 2 & column “Marks”

Now let’s see how to get this kind of results.

Find all indexes of an item in pandas dataframe

We have created a function that accepts a dataframe object and a value as argument. It returns a list of index positions ( i.e. row,column) of all occurrences of the given value in the dataframe i.e.

def getIndexes(dfObj, value):
    ''' Get index positions of value in dataframe i.e. dfObj.'''

    listOfPos = list()
    # Get bool dataframe with True at positions where the given value exists
    result = dfObj.isin([value])
    # Get list of columns that contains the value
    seriesObj = result.any()
    columnNames = list(seriesObj[seriesObj == True].index)
    # Iterate over list of columns and fetch the rows indexes where value exists
    for col in columnNames:
        rows = list(result[col][result[col] == True].index)
        for row in rows:
            listOfPos.append((row, col))
    # Return a list of tuples indicating the positions of value in the dataframe
    return listOfPos

Contents of our dataframe empDfObj are,

      Name  Age     City  Marks
0     jack   34   Sydney    155
1     Riti   31    Delhi    177
2     Aadi   16   Mumbai     81
3    Mohit   31    Delhi    167
4    Veena   81    Delhi    144
5  Shaunak   35   Mumbai    135
6    Shaun   35  Colombo    111

Let’s use above created function getIndexes() to find the all index positions of 81 in the dataframe i.e.

# Get list of index positions i.e. row & column of all occurrences of 81 in the dataframe
listOfPositions = getIndexes(empDfObj, 81)

print('Index positions of 81 in Dataframe : ')
for i in range(len(listOfPositions)):
    print('Position ', i, ' (Row index , Column Name) : ', listOfPositions[i])

Output:

Index positions of 81 in Dataframe : 
Position  0  (Row index , Column Name) :  (4, 'Age')
Position  1  (Row index , Column Name) :  (2, 'Marks')

We got our result i.e. exact row indexes & column names of all locations where 81 exists in the dataframe. But what just happened inside the getIndexes() function?

How did it work?

Let’s understand step by step that what happened in the function getIndexes() ,

Step 1: Get bool dataframe with True at positions where value is 81 in the dataframe using pandas.DataFrame.isin()

DataFrame.isin(self, values)

Dataframe provides a function isin(), which accepts values and returns a bool dataframe. This bool dataframe is of the same size as the original dataframe, it contains True at places where given values exist in the dataframe, at other places it contains False.

Let’s get a bool dataframe with True at positions where value is 81 i.e.

# Get bool dataframe with True at positions where value is 81
result = empDfObj.isin([81])
print('Bool Dataframe representing existence of value 81 as True')
print(result)

Output

Bool Dataframe representing existence of value 81 as True
    Name    Age   City  Marks
0  False  False  False  False
1  False  False  False  False
2  False  False  False   True
3  False  False  False  False
4  False   True  False  False
5  False  False  False  False
6  False  False  False  False

It is of the same size as empDfObj. As 81 exists at 2 places inside the dataframe, so this bool dataframe contains True at only those two places. At all other places it contains False.

Step 2 : Get list of columns that contains the value

Now we are going to fetch the names of columns that contain value 81. We can achieve this by fetching names of columns in bool dataframe which contains any True i.e.

# Get list of columns that contains the value i.e. 81
seriesObj = result.any()
columnNames = list(seriesObj[seriesObj == True].index)

print('Names of columns which contains 81:', columnNames)

Output

Names of columns which contains 81: ['Age', 'Marks']

Step 3 : Iterate over selected columns and fetch the indexes of the rows which contains the value

Now in the bool dataframe iterate over each of the selected columns and for each column find rows which contains True. Now these combinations of column names and row indexes where True exists are the index positions of 81 in the dataframe i.e.

# Iterate over each column and fetch the rows number where
for col in columnNames:
    rows = list(result[col][result[col] == True].index)
    for row in rows:
        print('Index : ', row, ' Col : ', col)

Output

Index :  4  Col :  Age
Index :  2  Col :  Marks

This is how getIndexes() founds the exact index positions of the given value & store each position as (row, column) tuple. In the end it returns a list of tuples representing its index positions in the dataframe.

Find index positions of multiple elements in the DataFrame

Suppose we have multiple elements i.e.

[81, 'Delhi', 'abc']

Now we want to find index positions of all these elements in our dataframe empDfObj, like this,

81  :  [(4, 'Age'), (2, 'Marks')]
Delhi  :  [(1, 'City'), (3, 'City'), (4, 'City')]
abc  :  []

Let’s use the getIndexes() and dictionary comprehension to find the indexes of all occurrences of multiple elements in the dataframe empDfObj,

listOfElems = [81, 'Delhi', 'abc']

# Use dict comprhension to club index positions of multiple elements in dataframe
dictOfPos = {elem: getIndexes(empDfObj, elem) for elem in listOfElems}

print('Position of given elements in Dataframe are : ')
for key, value in dictOfPos.items():
    print(key, ' : ', value)

Output

Position of given elements in Dataframe are : 
81  :  [(4, 'Age'), (2, 'Marks')]
Delhi  :  [(1, 'City'), (3, 'City'), (4, 'City')]
abc  :  []

dictOfPos is a dictionary of elements and their index positions in the dataframe. As ‘abc‘ doesn’t exist in the dataframe, therefore, its list is empty in dictionary dictOfPos.

The complete example is as follows,

import pandas as pd


def getIndexes(dfObj, value):
    ''' Get index positions of value in dataframe i.e. dfObj.'''

    listOfPos = list()
    # Get bool dataframe with True at positions where the given value exists
    result = dfObj.isin([value])
    # Get list of columns that contains the value
    seriesObj = result.any()
    columnNames = list(seriesObj[seriesObj == True].index)
    # Iterate over list of columns and fetch the rows indexes where value exists
    for col in columnNames:
        rows = list(result[col][result[col] == True].index)
        for row in rows:
            listOfPos.append((row, col))
    # Return a list of tuples indicating the positions of value in the dataframe
    return listOfPos


def main():
    # List of Tuples
    empoyees = [('jack', 34, 'Sydney', 155),
                ('Riti', 31, 'Delhi', 177),
                ('Aadi', 16, 'Mumbai', 81),
                ('Mohit', 31, 'Delhi', 167),
                ('Veena', 81, 'Delhi', 144),
                ('Shaunak', 35, 'Mumbai', 135),
                ('Shaun', 35, 'Colombo', 111)
                ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks'])

    print('Original Dataframe : ')
    print(empDfObj)

    print('**** Find all indexes of an item in pandas dataframe ****')

    # Get list of index positions i.e. row & column of all occurrences of 81 in the dataframe
    listOfPositions = getIndexes(empDfObj, 81)

    print('Index positions of 81 in Dataframe : ')
    for i in range(len(listOfPositions)):
        print('Position ', i, ' (Row index , Column Name) : ', listOfPositions[i])

    print('How did it worked ??')
    print('Break down of steps...')

    # Get bool dataframe with True at positions where value is 81
    result = empDfObj.isin([81])
    print('Bool Dataframe representing existance of value 81 as True')
    print(result)

    # Get list of columns that contains the value i.e. 81
    seriesObj = result.any()
    columnNames = list(seriesObj[seriesObj == True].index)
    print('Names of columns which contains 81:', columnNames)

    # Iterate over each column and fetch the rows number where
    for col in columnNames:
        rows = list(result[col][result[col] == True].index)
        for row in rows:
            print('Index : ', row, ' Col : ', col)

    print('-- Find the position of multiple elements in DataFrame')

    listOfElems = [81, 'Delhi', 'abc']

    # Use dict comprhension to club index positions of multiple elements in dataframe
    dictOfPos = {elem: getIndexes(empDfObj, elem) for elem in listOfElems}

    print('Position of given elements in Dataframe are : ')
    for key, value in dictOfPos.items():
        print(key, ' : ', value)


if __name__ == '__main__':
    main()

Output:

Original Dataframe : 
      Name  Age     City  Marks
0     jack   34   Sydney    155
1     Riti   31    Delhi    177
2     Aadi   16   Mumbai     81
3    Mohit   31    Delhi    167
4    Veena   81    Delhi    144
5  Shaunak   35   Mumbai    135
6    Shaun   35  Colombo    111
**** Find all indexes of an item in pandas dataframe ****
Index positions of 81 in Dataframe : 
Position  0  (Row index , Column Name) :  (4, 'Age')
Position  1  (Row index , Column Name) :  (2, 'Marks')
How did it worked ??
Break down of steps...
Bool Dataframe representing existance of value 81 as True
    Name    Age   City  Marks
0  False  False  False  False
1  False  False  False  False
2  False  False  False   True
3  False  False  False  False
4  False   True  False  False
5  False  False  False  False
6  False  False  False  False
Names of columns which contains 81: ['Age', 'Marks']
Index :  4  Col :  Age
Index :  2  Col :  Marks
-- Find the position of multiple elements in DataFrame
Position of given elements in Dataframe are : 
81  :  [(4, 'Age'), (2, 'Marks')]
Delhi  :  [(1, 'City'), (3, 'City'), (4, 'City')]
abc  :  []

5 thoughts on “Python: Find indexes of an element in pandas dataframe”

  1. Can you help me understand this part?

    list(seriesObj[seriesObj == True].index)

    The .index is a little confusing to me. The column names are the index of the seriesObj?

    thanks!

    1. Here seriesObj contains the bool values and index of this series object contains the column names of the original dataframe.

      seriesObj[seriesObj == True]
      # It selects only True values from the series object

      seriesObj[seriesObj == True].index
      # It returns the indexes of the entries in seriesObj whose values are true. As the indexes in seriesObj contains the column names.
      # So, it returns the column name which contains the original value.

      I hope it helps.

      Thanks,
      Varun

Leave a Reply to Varun Cancel Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top