Pandas – Select Rows by conditions on multiple columns


In this article we will discuss different ways to select rows in DataFrame based on condition on single or multiple columns.

Following Items will be discussed,

First let’s create a DataFrame,

# List of Tuples
students = [ ('jack', 'Apples' , 34) ,
             ('Riti', 'Mangos'  , 31) ,
             ('Aadi', 'Grapes' , 30) ,
             ('Sonia', 'Apples', 32) ,
             ('Lucy', 'Mangos'  , 33) ,
             ('Mike', 'Apples' , 35)
              ]


#Create a DataFrame object
dfObj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale']) 

Contents of DataFrame will be,

    Name Product  Sale
0   jack  Apples    34
1   Riti  Mangos    31
2   Aadi  Grapes    30
3  Sonia  Apples    32
4   Lucy  Mangos    33
5   Mike  Apples    35

Now let’s select rows from this DataFrame based on conditions,

Select Rows based on value in column

Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’,

subsetDataFrame = dfObj[dfObj['Product'] == 'Apples']

It will return a DataFrame in which Column ‘Product‘ contains ‘Apples‘ only i.e.

    Name Product  Sale
0   jack  Apples    34
3  Sonia  Apples    32
5   Mike  Apples    35

How does that work internally ?

Applying condition on a DataFrame like this,

dfObj['Product'] == 'Apples'

Will return a Series object of True & False i.e.

0     True
1    False
2    False
3     True
4    False
5     True
Name: Product, dtype: bool

Series will contain True when condition is passed and False in other cases.

If we pass this series object to [] operator of DataFrame, then it will return a new DataFrame with only those rows that has True in the passed Series object i.e.

dfObj[dfObj['Product'] == 'Apples']

It will return a DataFrame in which Column passed series object had True entry i.e.

DataFrame with Product : Apples
    Name Product  Sale
0   jack  Apples    34
3  Sonia  Apples    32
5   Mike  Apples    35

Select Rows based on any of the multiple values in column

Select rows in above DataFrame for which ‘Product‘ column contains either ‘GrapesorMangos‘ i.e

subsetDataFrame = dfObj[dfObj['Product'].isin(['Mangos', 'Grapes']) ]

We have passed a list of product names in isin() function of DataFrame that will return True for each entry in ‘Product‘ column that matches with any entry in given list.
Therefore, it will return a DataFrame in which Column ‘Product‘ contains either ‘Grapes‘ or ‘Mangos‘ only i.e.

   Name Product  Sale
1  Riti  Mangos    31
2  Aadi  Grapes    30
4  Lucy  Mangos    33

Select DataFrame Rows Based on multiple conditions on columns

Select rows in above DataFrame for which ‘Sale’ column contains Values greater than 30 & less than 33 i.e.

filterinfDataframe = dfObj[(dfObj['Sale'] > 30) & (dfObj['Sale'] < 33) ]

It will return following DataFrame object in which Sales column  contains value between 31 to 32,

    Name Product  Sale
1   Riti  Mangos    31
3  Sonia  Apples    32

Complete example is as follows,

import pandas as pd

def main():
    
    # List of Tuples
    students = [ ('jack', 'Apples' , 34) ,
                 ('Riti', 'Mangos'  , 31) ,
                 ('Aadi', 'Grapes' , 30) ,
                 ('Sonia', 'Apples', 32) ,
                 ('Lucy', 'Mangos'  , 33) ,
                 ('Mike', 'Apples' , 35)
                  ]
    
    
    #Create a DataFrame object
    dfObj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale']) 
    
    print("Original Dataframe" , dfObj, sep='\n')
    
    '''
    Select Rows based on value in a column
    '''
    subsetDataFrame = dfObj[dfObj['Product'] == 'Apples']
    
    print("DataFrame with Product : Apples" , subsetDataFrame, sep='\n')
    
    filteringSeries = dfObj['Product'] == 'Apples'
    print("Filtering Series" , filteringSeries, sep='\n')
    
    subsetDataFrame = dfObj[filteringSeries]
    
    print("DataFrame with Product : Apples" , subsetDataFrame, sep='\n')
    
    '''
    Select Rows based on any of the multiple values in a column
    '''
    subsetDataFrame = dfObj[dfObj['Product'].isin(['Mangos', 'Grapes']) ]
    
    print("DataFrame with Product : Mangos or Grapes" , subsetDataFrame, sep='\n')

    '''
    Select DataFrame Rows Based on multiple conditions on columns
    '''
        
    filterinfDataframe = dfObj[(dfObj['Sale'] > 30) & (dfObj['Sale'] < 33) ]
    
    print("DataFrame with Sales between 31 to 32" , filterinfDataframe, sep='\n')
    
if __name__ == '__main__':
    main()

Output:

Original Dataframe
    Name Product  Sale
0   jack  Apples    34
1   Riti  Mangos    31
2   Aadi  Grapes    30
3  Sonia  Apples    32
4   Lucy  Mangos    33
5   Mike  Apples    35
DataFrame with Product : Apples
    Name Product  Sale
0   jack  Apples    34
3  Sonia  Apples    32
5   Mike  Apples    35
Filtering Series
0     True
1    False
2    False
3     True
4    False
5     True
Name: Product, dtype: bool
DataFrame with Product : Apples
    Name Product  Sale
0   jack  Apples    34
3  Sonia  Apples    32
5   Mike  Apples    35
DataFrame with Product : Mangos or Grapes
   Name Product  Sale
1  Riti  Mangos    31
2  Aadi  Grapes    30
4  Lucy  Mangos    33
DataFrame with Sales between 31 to 32
    Name Product  Sale
1   Riti  Mangos    31
3  Sonia  Apples    32

 

 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top