Several times, we need to extract out the rows where the columns match certain values. In this article, we will discuss different ways to achieve that.
Table of Contents
Introduction
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df)
Contents of the created dataframe are,
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11
Now, let’s look at different ways in which we could add a new column in this DataFrame.
Get rows whose column matches a value Using loc[]
This is the most common method to extract out the row indices where the column matches a certain value. Let’s try to extract the row indices where the City is equal to “Delhi”.
As the first step, let’s filter the DataFrame based on a certain column value.
Frequently Asked:
# slice the DataFrame where column City is Delhi print (df.loc[df['City'] == 'Delhi'])
Output
Name Designation City Experience 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2
In the second step, let’s extract the indices of this sliced DataFrame using the “index” method from DataFrame.
# extracting indices where column City is Delhi indices = df.loc[df['City'] == 'Delhi'].index print (indices)
Output
Int64Index([1, 2], dtype='int64')
The output is an Index object. Therefore, to extract the actual indices, we can convert them to a list that can be utilized in further analysis.
# converting to list indices = df.loc[df['City'] == 'Delhi'].index.tolist() print (indices)
Output
[1,2]
The output gives the row indices 1 and 2 satisfy the following column criteria. Here, we could play around with different conditions based on the requirements.
The complete example is as follows,
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # slice the DataFrame where column City is Delhi print (df.loc[df['City'] == 'Delhi']) # extracting indices where column City is Delhi indices = df.loc[df['City'] == 'Delhi'].index print (indices) # converting to list indices = df.loc[df['City'] == 'Delhi'].index.tolist() print (indices)
Output:
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 Name Designation City Experience 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 Int64Index([1, 2], dtype='int64') [1, 2]
Get rows whose column matches a value Using where() method from Numpy
The second method is to use the where() function from Numpy to get the row indices where the column conditions match the required criteria. Let’s take a look at the example below.
# using the where method print (np.where(df['City'] == 'Delhi'))
Output
(array([1, 2]),)
Here, the output would be an ndarray containing the indices where the column values are matched.
Get rows whose column matches a value Using query() method
An alternate method for the SQL enthusiast to utilize the “query” method to extract the indices of the rows based on a certain column value. Let’s try to extract the same condition as above using the query method.
# using the query method print (df.query("City == 'Delhi'").index)
Output
Int64Index([1, 2], dtype='int64')
This also returns an Index object, which can be converted to a list by doing “tolist” on the output.
The complete example is as follows,
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # using the where method print (np.where(df['City'] == 'Delhi')) # using the query method print (df.query("City == 'Delhi'").index)
Output:
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 (array([1, 2]),) Int64Index([1, 2], dtype='int64')
Summary
Great, you made it! In this article, we have discussed multiple ways to get the index of rows where the column matches. Thanks.