Filter DataFrame Rows with operator chaining in Pandas

For every typical data-related project, we always need to filter our data based on certain conditions. Sometimes, including these conditions makes our code flow a little messy. Here, operator chaining is an important technique to reduce the number of code lines by chaining or combining multiple statements together. In this article, let’s look at multiple ways to do operator chaining in pandas.

Table of Contents

Preparing DataSet

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'])
print(df)

Contents of the created dataframe are,

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

Filter DataFrame Rows Using the loc method

One of the simplest ways to chain multiple operators is to use these conditions within the loc method. Let’s try to understand with an example, say, we need to filter the above DataFrame to include all the employees under the Tech team who have more than 3 years of experience. Now, instead of doing it in multiple steps, let’s look at achieving this in a single line of code.

Advertisements
# Select employees in Tech team with experience greater than 3 years
# filtering using the AND operator chaining
print (df.loc[(df['Team'] == 'Tech') & (df['Experience'] > 3)])

Output

        Name        Designation  Team  Experience
10   Shubham     Data Scientist  Tech           5
123     Riti      Data Engineer  Tech           7
20      Aadi  Backend Developer  Tech          11
11       Sim      Data Engineer  Tech           4

As observed, we have filtered the DataFrame for both the above-mentioned conditions by chaining the “AND” operator.

Filter DataFrame Rows Using the query function

With the introduction of the query function in pandas, it has things very familiar to SQL users. We can chain multiple statements easily, so let’s try doing the same task using the query function.

# Select employees in Tech team with experience greater than 3 years
# filtering using the AND operator chaining
print (df.query("Team == 'Tech' and Experience > 3"))

Output

        Name        Designation  Team  Experience
10   Shubham     Data Scientist  Tech           5
123     Riti      Data Engineer  Tech           7
20      Aadi  Backend Developer  Tech          11
11       Sim      Data Engineer  Tech           4

We can include other conditions as well or maybe use a different operator to filter the pandas DataFrame.

Filter DataFrame Rows Using the pipe function

The pipe function is a great alternative to improve code readability. There are a lot of advantages to using that, but for now, we will only discuss how to use it for operator chaining. Let’s try to implement it again for understanding.

# Select employees in Tech team with experience greater than 3 years
# filtering using the AND operator chaining
print(df[df.pipe(lambda x: (x['Team'] == 'Tech') & (x['Experience'] > 3))])

Output

        Name        Designation  Team  Experience
10   Shubham     Data Scientist  Tech           5
123     Riti      Data Engineer  Tech           7
20      Aadi  Backend Developer  Tech          11
11       Sim      Data Engineer  Tech           4

The syntax here is also very similar to the above methods. But it comes in very handy whenever you want to pipe all the data processing into a single step, making the code very readable and clean.

The Complete example is as follows,

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'])
print(df)

# Select employees in Tech team with experience greater than 3 years
# filtering using the AND operator chaining
print (df.query("Team == 'Tech' and Experience > 3"))

# Select employees in Tech team with experience greater than 3 years
# filtering using the AND operator chaining
print (df.query("Team == 'Tech' and Experience > 3"))

# Select employees in Tech team with experience greater than 3 years
# filtering using the AND operator chaining
print(df[df.pipe(lambda x: (x['Team'] == 'Tech') & (x['Experience'] > 3))])

Output :

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

      Name        Designation  Team  Experience
0  Shubham     Data Scientist  Tech           5
1     Riti      Data Engineer  Tech           7
4     Aadi  Backend Developer  Tech          11
5      Sim      Data Engineer  Tech           4

      Name        Designation  Team  Experience
0  Shubham     Data Scientist  Tech           5
1     Riti      Data Engineer  Tech           7
4     Aadi  Backend Developer  Tech          11
5      Sim      Data Engineer  Tech           4

      Name        Designation  Team  Experience
0  Shubham     Data Scientist  Tech           5
1     Riti      Data Engineer  Tech           7
4     Aadi  Backend Developer  Tech          11
5      Sim      Data Engineer  Tech           4

Summary

In this article, we have discussed multiple ways to filter rows using operator chaining in pandas. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top