In this article we will discuss different ways to select rows in DataFrame based on condition on single or multiple columns.
Following Items will be discussed,
- Select Rows based on value in column
- Select Rows based on any of the multiple values in column
- Select Rows based on any of the multiple conditions on column
First let’s create a DataFrame,
# List of Tuples students = [ ('jack', 'Apples' , 34) , ('Riti', 'Mangos' , 31) , ('Aadi', 'Grapes' , 30) , ('Sonia', 'Apples', 32) , ('Lucy', 'Mangos' , 33) , ('Mike', 'Apples' , 35) ] #Create a DataFrame object dfObj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale'])
Contents of DataFrame will be,
Frequently Asked:
Name Product Sale 0 jack Apples 34 1 Riti Mangos 31 2 Aadi Grapes 30 3 Sonia Apples 32 4 Lucy Mangos 33 5 Mike Apples 35
Now let’s select rows from this DataFrame based on conditions,
Select Rows based on value in column
Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’,
subsetDataFrame = dfObj[dfObj['Product'] == 'Apples']
It will return a DataFrame in which Column ‘Product‘ contains ‘Apples‘ only i.e.
Name Product Sale 0 jack Apples 34 3 Sonia Apples 32 5 Mike Apples 35
How does that work internally ?
Latest Python - Video Tutorial
Applying condition on a DataFrame like this,
dfObj['Product'] == 'Apples'
Will return a Series object of True & False i.e.
0 True 1 False 2 False 3 True 4 False 5 True Name: Product, dtype: bool
Series will contain True when condition is passed and False in other cases.
If we pass this series object to [] operator of DataFrame, then it will return a new DataFrame with only those rows that has True in the passed Series object i.e.
dfObj[dfObj['Product'] == 'Apples']
It will return a DataFrame in which Column passed series object had True entry i.e.
DataFrame with Product : Apples Name Product Sale 0 jack Apples 34 3 Sonia Apples 32 5 Mike Apples 35
Select Rows based on any of the multiple values in column
Select rows in above DataFrame for which ‘Product‘ column contains either ‘Grapes‘ or ‘Mangos‘ i.e
subsetDataFrame = dfObj[dfObj['Product'].isin(['Mangos', 'Grapes']) ]
We have passed a list of product names in isin() function of DataFrame that will return True for each entry in ‘Product‘ column that matches with any entry in given list.
Therefore, it will return a DataFrame in which Column ‘Product‘ contains either ‘Grapes‘ or ‘Mangos‘ only i.e.
Name Product Sale 1 Riti Mangos 31 2 Aadi Grapes 30 4 Lucy Mangos 33
Select DataFrame Rows Based on multiple conditions on columns
Select rows in above DataFrame for which ‘Sale’ column contains Values greater than 30 & less than 33 i.e.
filterinfDataframe = dfObj[(dfObj['Sale'] > 30) & (dfObj['Sale'] < 33) ]
It will return following DataFrame object in which Sales column contains value between 31 to 32,
Name Product Sale 1 Riti Mangos 31 3 Sonia Apples 32
Complete example is as follows,
import pandas as pd def main(): # List of Tuples students = [ ('jack', 'Apples' , 34) , ('Riti', 'Mangos' , 31) , ('Aadi', 'Grapes' , 30) , ('Sonia', 'Apples', 32) , ('Lucy', 'Mangos' , 33) , ('Mike', 'Apples' , 35) ] #Create a DataFrame object dfObj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale']) print("Original Dataframe" , dfObj, sep='\n') ''' Select Rows based on value in a column ''' subsetDataFrame = dfObj[dfObj['Product'] == 'Apples'] print("DataFrame with Product : Apples" , subsetDataFrame, sep='\n') filteringSeries = dfObj['Product'] == 'Apples' print("Filtering Series" , filteringSeries, sep='\n') subsetDataFrame = dfObj[filteringSeries] print("DataFrame with Product : Apples" , subsetDataFrame, sep='\n') ''' Select Rows based on any of the multiple values in a column ''' subsetDataFrame = dfObj[dfObj['Product'].isin(['Mangos', 'Grapes']) ] print("DataFrame with Product : Mangos or Grapes" , subsetDataFrame, sep='\n') ''' Select DataFrame Rows Based on multiple conditions on columns ''' filterinfDataframe = dfObj[(dfObj['Sale'] > 30) & (dfObj['Sale'] < 33) ] print("DataFrame with Sales between 31 to 32" , filterinfDataframe, sep='\n') if __name__ == '__main__': main()
Output:
Original Dataframe Name Product Sale 0 jack Apples 34 1 Riti Mangos 31 2 Aadi Grapes 30 3 Sonia Apples 32 4 Lucy Mangos 33 5 Mike Apples 35 DataFrame with Product : Apples Name Product Sale 0 jack Apples 34 3 Sonia Apples 32 5 Mike Apples 35 Filtering Series 0 True 1 False 2 False 3 True 4 False 5 True Name: Product, dtype: bool DataFrame with Product : Apples Name Product Sale 0 jack Apples 34 3 Sonia Apples 32 5 Mike Apples 35 DataFrame with Product : Mangos or Grapes Name Product Sale 1 Riti Mangos 31 2 Aadi Grapes 30 4 Lucy Mangos 33 DataFrame with Sales between 31 to 32 Name Product Sale 1 Riti Mangos 31 3 Sonia Apples 32
Latest Video Tutorials