In this article, we will discuss how to find index positions of a given value in the dataframe i.e. row & column numbers.
Let’s create a dataframe i.e.
# List of Tuples empoyees = [('jack', 34, 'Sydney', 155), ('Riti', 31, 'Delhi', 177), ('Aadi', 16, 'Mumbai', 81), ('Mohit', 31, 'Delhi', 167), ('Veena', 81, 'Delhi', 144), ('Shaunak', 35, 'Mumbai', 135), ('Shaun', 35, 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks']) print(empDfObj)
Content of the dataframe is,
Name Age City Marks 0 jack 34 Sydney 155 1 Riti 31 Delhi 177 2 Aadi 16 Mumbai 81 3 Mohit 31 Delhi 167 4 Veena 81 Delhi 144 5 Shaunak 35 Mumbai 135 6 Shaun 35 Colombo 111
Now we want to find the exact locations in the dataframe where value ’81’ exists. Like,
(4, 'Age') (2, 'Marks')
It shows that value 81 exists at 2 places in the above dataframe i.e.
- At row index 4 & column “Age”
- At row index 2 & column “Marks”
Now let’s see how to get this kind of results.
Frequently Asked:
Find all indexes of an item in pandas dataframe
We have created a function that accepts a dataframe object and a value as argument. It returns a list of index positions ( i.e. row,column) of all occurrences of the given value in the dataframe i.e.
def getIndexes(dfObj, value): ''' Get index positions of value in dataframe i.e. dfObj.''' listOfPos = list() # Get bool dataframe with True at positions where the given value exists result = dfObj.isin([value]) # Get list of columns that contains the value seriesObj = result.any() columnNames = list(seriesObj[seriesObj == True].index) # Iterate over list of columns and fetch the rows indexes where value exists for col in columnNames: rows = list(result[col][result[col] == True].index) for row in rows: listOfPos.append((row, col)) # Return a list of tuples indicating the positions of value in the dataframe return listOfPos
Contents of our dataframe empDfObj are,
Name Age City Marks 0 jack 34 Sydney 155 1 Riti 31 Delhi 177 2 Aadi 16 Mumbai 81 3 Mohit 31 Delhi 167 4 Veena 81 Delhi 144 5 Shaunak 35 Mumbai 135 6 Shaun 35 Colombo 111
Let’s use above created function getIndexes() to find the all index positions of 81 in the dataframe i.e.
# Get list of index positions i.e. row & column of all occurrences of 81 in the dataframe listOfPositions = getIndexes(empDfObj, 81) print('Index positions of 81 in Dataframe : ') for i in range(len(listOfPositions)): print('Position ', i, ' (Row index , Column Name) : ', listOfPositions[i])
Output:
Index positions of 81 in Dataframe : Position 0 (Row index , Column Name) : (4, 'Age') Position 1 (Row index , Column Name) : (2, 'Marks')
We got our result i.e. exact row indexes & column names of all locations where 81 exists in the dataframe. But what just happened inside the getIndexes() function?
How did it work?
Let’s understand step by step that what happened in the function getIndexes() ,
Step 1: Get bool dataframe with True at positions where value is 81 in the dataframe using pandas.DataFrame.isin()
DataFrame.isin(self, values)
Dataframe provides a function isin(), which accepts values and returns a bool dataframe. This bool dataframe is of the same size as the original dataframe, it contains True at places where given values exist in the dataframe, at other places it contains False.
Let’s get a bool dataframe with True at positions where value is 81 i.e.
# Get bool dataframe with True at positions where value is 81 result = empDfObj.isin([81]) print('Bool Dataframe representing existence of value 81 as True') print(result)
Output
Bool Dataframe representing existence of value 81 as True Name Age City Marks 0 False False False False 1 False False False False 2 False False False True 3 False False False False 4 False True False False 5 False False False False 6 False False False False
It is of the same size as empDfObj. As 81 exists at 2 places inside the dataframe, so this bool dataframe contains True at only those two places. At all other places it contains False.
Step 2 : Get list of columns that contains the value
Now we are going to fetch the names of columns that contain value 81. We can achieve this by fetching names of columns in bool dataframe which contains any True i.e.
# Get list of columns that contains the value i.e. 81 seriesObj = result.any() columnNames = list(seriesObj[seriesObj == True].index) print('Names of columns which contains 81:', columnNames)
Output
Names of columns which contains 81: ['Age', 'Marks']
Step 3 : Iterate over selected columns and fetch the indexes of the rows which contains the value
Now in the bool dataframe iterate over each of the selected columns and for each column find rows which contains True. Now these combinations of column names and row indexes where True exists are the index positions of 81 in the dataframe i.e.
# Iterate over each column and fetch the rows number where for col in columnNames: rows = list(result[col][result[col] == True].index) for row in rows: print('Index : ', row, ' Col : ', col)
Output
Index : 4 Col : Age Index : 2 Col : Marks
This is how getIndexes() founds the exact index positions of the given value & store each position as (row, column) tuple. In the end it returns a list of tuples representing its index positions in the dataframe.
Find index positions of multiple elements in the DataFrame
Suppose we have multiple elements i.e.
[81, 'Delhi', 'abc']
Now we want to find index positions of all these elements in our dataframe empDfObj, like this,
81 : [(4, 'Age'), (2, 'Marks')] Delhi : [(1, 'City'), (3, 'City'), (4, 'City')] abc : []
Let’s use the getIndexes() and dictionary comprehension to find the indexes of all occurrences of multiple elements in the dataframe empDfObj,
listOfElems = [81, 'Delhi', 'abc'] # Use dict comprhension to club index positions of multiple elements in dataframe dictOfPos = {elem: getIndexes(empDfObj, elem) for elem in listOfElems} print('Position of given elements in Dataframe are : ') for key, value in dictOfPos.items(): print(key, ' : ', value)
Output
Position of given elements in Dataframe are : 81 : [(4, 'Age'), (2, 'Marks')] Delhi : [(1, 'City'), (3, 'City'), (4, 'City')] abc : []
dictOfPos is a dictionary of elements and their index positions in the dataframe. As ‘abc‘ doesn’t exist in the dataframe, therefore, its list is empty in dictionary dictOfPos.
The complete example is as follows,
import pandas as pd def getIndexes(dfObj, value): ''' Get index positions of value in dataframe i.e. dfObj.''' listOfPos = list() # Get bool dataframe with True at positions where the given value exists result = dfObj.isin([value]) # Get list of columns that contains the value seriesObj = result.any() columnNames = list(seriesObj[seriesObj == True].index) # Iterate over list of columns and fetch the rows indexes where value exists for col in columnNames: rows = list(result[col][result[col] == True].index) for row in rows: listOfPos.append((row, col)) # Return a list of tuples indicating the positions of value in the dataframe return listOfPos def main(): # List of Tuples empoyees = [('jack', 34, 'Sydney', 155), ('Riti', 31, 'Delhi', 177), ('Aadi', 16, 'Mumbai', 81), ('Mohit', 31, 'Delhi', 167), ('Veena', 81, 'Delhi', 144), ('Shaunak', 35, 'Mumbai', 135), ('Shaun', 35, 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks']) print('Original Dataframe : ') print(empDfObj) print('**** Find all indexes of an item in pandas dataframe ****') # Get list of index positions i.e. row & column of all occurrences of 81 in the dataframe listOfPositions = getIndexes(empDfObj, 81) print('Index positions of 81 in Dataframe : ') for i in range(len(listOfPositions)): print('Position ', i, ' (Row index , Column Name) : ', listOfPositions[i]) print('How did it worked ??') print('Break down of steps...') # Get bool dataframe with True at positions where value is 81 result = empDfObj.isin([81]) print('Bool Dataframe representing existance of value 81 as True') print(result) # Get list of columns that contains the value i.e. 81 seriesObj = result.any() columnNames = list(seriesObj[seriesObj == True].index) print('Names of columns which contains 81:', columnNames) # Iterate over each column and fetch the rows number where for col in columnNames: rows = list(result[col][result[col] == True].index) for row in rows: print('Index : ', row, ' Col : ', col) print('-- Find the position of multiple elements in DataFrame') listOfElems = [81, 'Delhi', 'abc'] # Use dict comprhension to club index positions of multiple elements in dataframe dictOfPos = {elem: getIndexes(empDfObj, elem) for elem in listOfElems} print('Position of given elements in Dataframe are : ') for key, value in dictOfPos.items(): print(key, ' : ', value) if __name__ == '__main__': main()
Output:
Original Dataframe : Name Age City Marks 0 jack 34 Sydney 155 1 Riti 31 Delhi 177 2 Aadi 16 Mumbai 81 3 Mohit 31 Delhi 167 4 Veena 81 Delhi 144 5 Shaunak 35 Mumbai 135 6 Shaun 35 Colombo 111 **** Find all indexes of an item in pandas dataframe **** Index positions of 81 in Dataframe : Position 0 (Row index , Column Name) : (4, 'Age') Position 1 (Row index , Column Name) : (2, 'Marks') How did it worked ?? Break down of steps... Bool Dataframe representing existance of value 81 as True Name Age City Marks 0 False False False False 1 False False False False 2 False False False True 3 False False False False 4 False True False False 5 False False False False 6 False False False False Names of columns which contains 81: ['Age', 'Marks'] Index : 4 Col : Age Index : 2 Col : Marks -- Find the position of multiple elements in DataFrame Position of given elements in Dataframe are : 81 : [(4, 'Age'), (2, 'Marks')] Delhi : [(1, 'City'), (3, 'City'), (4, 'City')] abc : []
Thanks, this post is super helpful !!! 🙂
Very Nice
Thank you!
Can you help me understand this part?
list(seriesObj[seriesObj == True].index)
The .index is a little confusing to me. The column names are the index of the seriesObj?
thanks!
Here seriesObj contains the bool values and index of this series object contains the column names of the original dataframe.
seriesObj[seriesObj == True]
# It selects only True values from the series object
seriesObj[seriesObj == True].index
# It returns the indexes of the entries in seriesObj whose values are true. As the indexes in seriesObj contains the column names.
# So, it returns the column name which contains the original value.
I hope it helps.
Thanks,
Varun