Get DataFrame Rows whose column has a certain value

Several times, we need to extract out the rows where the columns match certain values. In this article, we will discuss different ways to achieve that.

Table of Contents

Introduction

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

Contents of the created dataframe are,

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

Now, let’s look at different ways in which we could add a new column in this DataFrame.

Advertisements

Get rows whose column matches a value Using loc[]

This is the most common method to extract out the row indices where the column matches a certain value. Let’s try to extract the row indices where the City is equal to “Delhi”.

As the first step, let’s filter the DataFrame based on a certain column value.

# slice the DataFrame where column City is Delhi
print (df.loc[df['City'] == 'Delhi'])

Output

     Name      Designation   City  Experience
1    Riti     Data Analyst  Delhi           7
2  Shanky  Program Manager  Delhi           2

In the second step, let’s extract the indices of this sliced DataFrame using the “index” method from DataFrame.

# extracting indices where column City is Delhi
indices = df.loc[df['City'] == 'Delhi'].index

print (indices)

Output

Int64Index([1, 2], dtype='int64')

The output is an Index object. Therefore, to extract the actual indices, we can convert them to a list that can be utilized in further analysis.

# converting to list
indices = df.loc[df['City'] == 'Delhi'].index.tolist()

print (indices)

Output

[1,2]

The output gives the row indices 1 and 2 satisfy the following column criteria. Here, we could play around with different conditions based on the requirements.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# slice the DataFrame where column City is Delhi
print (df.loc[df['City'] == 'Delhi'])

# extracting indices where column City is Delhi
indices = df.loc[df['City'] == 'Delhi'].index

print (indices)

# converting to list
indices = df.loc[df['City'] == 'Delhi'].index.tolist()

print (indices)

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

     Name      Designation   City  Experience
1    Riti     Data Analyst  Delhi           7
2  Shanky  Program Manager  Delhi           2

Int64Index([1, 2], dtype='int64')

[1, 2]

Get rows whose column matches a value Using where() method from Numpy

The second method is to use the where() function from Numpy to get the row indices where the column conditions match the required criteria. Let’s take a look at the example below.

# using the where method
print (np.where(df['City'] == 'Delhi'))

Output

(array([1, 2]),)

Here, the output would be an ndarray containing the indices where the column values are matched.

Get rows whose column matches a value Using query() method

An alternate method for the SQL enthusiast to utilize the “query” method to extract the indices of the rows based on a certain column value. Let’s try to extract the same condition as above using the query method.

# using the query method
print (df.query("City == 'Delhi'").index)

Output

Int64Index([1, 2], dtype='int64')

This also returns an Index object, which can be converted to a list by doing “tolist” on the output.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# using the where method
print (np.where(df['City'] == 'Delhi'))

# using the query method
print (df.query("City == 'Delhi'").index)

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

(array([1, 2]),)

Int64Index([1, 2], dtype='int64')

Summary

Great, you made it! In this article, we have discussed multiple ways to get the index of rows where the column matches. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top