In this article, we will discuss different scenarios to select rows from a Pandas DataFrame based on the column values.
Table of Contents
Introduction
Let’s create a sample dataframe to experiment with different scenarios. We’ll use the pandas library with some random data.
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e']) print(df)
Contents of the created dataframe are,
Name Designation City Experience a Shubham Data Scientist Sydney 5 b Riti Data Analyst Delhi 7 c Shanky Program Manager Delhi 2 d Shreya Graphic Designer Mumbai 2 e Aadi Data Engineering New York 11
Now, let’s look at different scenarios in which we could select rows based on the column values.
Select DaraFrame Rows based on a specific value(s)
In this scenario, let’s say, we need to select rows from the dataframe based on some specific column values. For example, we need to select rows of the employees having experience of more than 4 years.
# filter employees having Experience > 4 filterDf = df[df['Experience'] > 4] print (filterDf)
Output
Name Designation City Experience a Shubham Data Scientist Sydney 5 b Riti Data Analyst Delhi 7 e Aadi Data Engineering New York 11
We have filtered all the three employees having experience more than 4 years. Now, in case, if we need to select the employees having experience of more than 4 years and less than or equal to 10 years, we would need to include multiple conditions as shown below.
# filter employees having Experience > 4 and Experience < 10 filterDf = df[(df['Experience'] > 4) & (df['Experience'] <= 10)] print (filterDf)
Output
Name Designation City Experience a Shubham Data Scientist Sydney 5 b Riti Data Analyst Delhi 7
Note: We could use pipe operator “|” in case of “OR” conditions.
The complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e']) print(df) # filter employees having Experience > 4 filterDf = df[df['Experience'] > 4] print (filterDf) # filter employees having Experience > 4 and Experience < 10 filterDf = df[(df['Experience'] > 4) & (df['Experience'] <= 10)] print (filterDf)
Output:
Name Designation City Experience a Shubham Data Scientist Sydney 5 b Riti Data Analyst Delhi 7 c Shanky Program Manager Delhi 2 d Shreya Graphic Designer Mumbai 2 e Aadi Data Engineering New York 11 Name Designation City Experience a Shubham Data Scientist Sydney 5 b Riti Data Analyst Delhi 7 e Aadi Data Engineering New York 11 Name Designation City Experience a Shubham Data Scientist Sydney 5 b Riti Data Analyst Delhi 7
Select DataFrame Rows based on a multiple values
In this scenario, we need to select the rows in dataframe based on a list of column values. For example, we need to select the employees in “Delhi” and “Mumbai” cities. One solution could be to use multiple AND/OR operators, but it would get convoluted with long list of values; which is where “isin” operator comes in handy.
# filter employees in "Delhi" and "Mumbai" cities filterDf = df[df['City'].isin(['Mumbai', 'Delhi'])] print (filterDf)
Output
Name Designation City Experience b Riti Data Analyst Delhi 7 c Shanky Program Manager Delhi 2 d Shreya Graphic Designer Mumbai 2
We can again include multiple “isin” conditions using the AND/OR operators.
Select DataFrame Rows containing partial string or substring
In this scenario, we need to filter the dataframe based on partial string matching. For example, we need to select the employees working in “Data” field (information from the Designation).
# filter employees having "Data" in their designation filterDf = df[df['Designation'].str.contains("Data")] print (filterDf)
Output
Name Designation City Experience a Shubham Data Scientist Sydney 5 b Riti Data Analyst Delhi 7 e Aadi Data Engineering New York 11
Select DataFrame Rows using the query method
The newly introduced query method for pandas dataframe is a bonus for all SQL lovers. The syntax is quite similar to standard SQL query, and it is very efficient for the large dataframes. Let’s take a quick example – select all the employees from “Delhi”.
# filter employees from "Delhi" filterDf = df.query("City == 'Delhi'") print (filterDf)
Output
Name Designation City Experience b Riti Data Analyst Delhi 7 c Shanky Program Manager Delhi 2
Important Note: Please make sure you are using pandas version > 0.25.0 for using the query method.
The complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e']) print(df) # filter employees in "Delhi" and "Mumbai" cities filterDf = df[df['City'].isin(['Mumbai', 'Delhi'])] print (filterDf) # filter employees having "Data" in their designation filterDf = df[df['Designation'].str.contains("Data")] print (filterDf) # filter employees from "Delhi" filterDf = df.query("City == 'Delhi'") print (filterDf)
Output
Name Designation City Experience a Shubham Data Scientist Sydney 5 b Riti Data Analyst Delhi 7 c Shanky Program Manager Delhi 2 d Shreya Graphic Designer Mumbai 2 e Aadi Data Engineering New York 11 Name Designation City Experience b Riti Data Analyst Delhi 7 c Shanky Program Manager Delhi 2 d Shreya Graphic Designer Mumbai 2 Name Designation City Experience a Shubham Data Scientist Sydney 5 b Riti Data Analyst Delhi 7 e Aadi Data Engineering New York 11 Name Designation City Experience b Riti Data Analyst Delhi 7 c Shanky Program Manager Delhi 2
Summary
Great, you made it! In this article, we have discussed multiple scenarios to select rows from a pandas dataframe based on the column values. Thanks.