For every typical data-related project, we always need to filter our data based on certain conditions. Sometimes, including these conditions makes our code flow a little messy. Here, operator chaining is an important technique to reduce the number of code lines by chaining or combining multiple statements together. In this article, let’s look at multiple ways to do operator chaining in pandas.
Table of Contents
Preparing DataSet
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience']) print(df)
Contents of the created dataframe are,
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4
Filter DataFrame Rows Using the loc method
One of the simplest ways to chain multiple operators is to use these conditions within the loc method. Let’s try to understand with an example, say, we need to filter the above DataFrame to include all the employees under the Tech team who have more than 3 years of experience. Now, instead of doing it in multiple steps, let’s look at achieving this in a single line of code.
# Select employees in Tech team with experience greater than 3 years # filtering using the AND operator chaining print (df.loc[(df['Team'] == 'Tech') & (df['Experience'] > 3)])
Output
Name Designation Team Experience 10 Shubham Data Scientist Tech 5 123 Riti Data Engineer Tech 7 20 Aadi Backend Developer Tech 11 11 Sim Data Engineer Tech 4
As observed, we have filtered the DataFrame for both the above-mentioned conditions by chaining the “AND” operator.
Filter DataFrame Rows Using the query function
With the introduction of the query function in pandas, it has things very familiar to SQL users. We can chain multiple statements easily, so let’s try doing the same task using the query function.
# Select employees in Tech team with experience greater than 3 years # filtering using the AND operator chaining print (df.query("Team == 'Tech' and Experience > 3"))
Output
Name Designation Team Experience 10 Shubham Data Scientist Tech 5 123 Riti Data Engineer Tech 7 20 Aadi Backend Developer Tech 11 11 Sim Data Engineer Tech 4
We can include other conditions as well or maybe use a different operator to filter the pandas DataFrame.
Filter DataFrame Rows Using the pipe function
The pipe function is a great alternative to improve code readability. There are a lot of advantages to using that, but for now, we will only discuss how to use it for operator chaining. Let’s try to implement it again for understanding.
# Select employees in Tech team with experience greater than 3 years # filtering using the AND operator chaining print(df[df.pipe(lambda x: (x['Team'] == 'Tech') & (x['Experience'] > 3))])
Output
Name Designation Team Experience 10 Shubham Data Scientist Tech 5 123 Riti Data Engineer Tech 7 20 Aadi Backend Developer Tech 11 11 Sim Data Engineer Tech 4
The syntax here is also very similar to the above methods. But it comes in very handy whenever you want to pipe all the data processing into a single step, making the code very readable and clean.
The Complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience']) print(df) # Select employees in Tech team with experience greater than 3 years # filtering using the AND operator chaining print (df.query("Team == 'Tech' and Experience > 3")) # Select employees in Tech team with experience greater than 3 years # filtering using the AND operator chaining print (df.query("Team == 'Tech' and Experience > 3")) # Select employees in Tech team with experience greater than 3 years # filtering using the AND operator chaining print(df[df.pipe(lambda x: (x['Team'] == 'Tech') & (x['Experience'] > 3))])
Output :
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4 Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4 Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4 Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4
Summary
In this article, we have discussed multiple ways to filter rows using operator chaining in pandas. Thanks.