Python Pandas : How to Drop rows in DataFrame by conditions on column values

In this article we will discuss how to delete rows based in DataFrame by checking multiple conditions on column values.

DataFrame provides a member function drop() i.e.

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

It accepts a single or list of label names and deletes the corresponding rows or columns (based on value of axis parameter i.e. 0 for rows or 1 for columns).

Let’s use this do delete multiple rows by conditions.

Let’s create a dataframe object from dictionary

# List of Tuples
students = [ ('jack', 34, 'Sydeny' , 'Australia') ,
             ('Riti', 30, 'Delhi' , 'India' ) ,
             ('Vikas', 31, 'Mumbai' , 'India' ) ,
             ('Neelu', 32, 'Bangalore' , 'India' ) ,
             ('John', 16, 'New York' , 'US') ,
             ('Mike', 17, 'las vegas' , 'US')  ]


#Create a DataFrame object
dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'], index=['a', 'b', 'c' , 'd' , 'e' , 'f']) 

Delete rows based on condition on a column

Contents of dataframe object dfObj will be,

Original DataFrame pointed by dfObj

Let’s delete all rows for which column ‘Age’ has value 30 i.e.

# Get names of indexes for which column Age has value 30
indexNames = dfObj[ dfObj['Age'] == 30 ].index

# Delete these row indexes from dataFrame
dfObj.drop(indexNames , inplace=True)

Contents of updated dataframe object dfObj will be,

DataFrame rows with value 30 in Column Age are deleted

What just happened here ? Let’s understand,

dfObj['Age'] == 30

It will give Series object with True and False. True for entries which has value 30 and False for others i.e.

a False
b True
c False
d False
e False
f False
Name: Age, dtype: bool

Let’s create a new DataFrame object with this series and existing DataFrame object dfObj i.e.

dfObj[ dfObj['Age'] == 30 ]

It will give a new dataframe object that has only that row for which column ‘Age’ has value 30 i.e.

   Name Age City   Country
b Riti    30    Delhi  India

Now, this dataframe contains the rows which we want to delete from original dataframe. So, let’s get the index names from this dataframe object i.e.

dfObj[ dfObj['Age'] == 30 ].index

It will give an Index object containing index labels for which column ‘Age’ has value 30 i.e.

Index(['b'], dtype='object')

Now pass this to dataframe.drop() to delete these rows i.e.

dfObj.drop( dfObj[ dfObj['Age'] == 30 ].index , inplace=True)

It will delete the all rows for which column ‘Age’ has value 30.

Delete rows based on multiple conditions on a column

Suppose Contents of dataframe object dfObj is,

Original DataFrame pointed by dfObj

Let’s delete all rows for which column ‘Age’ has value between 30 to 40 i.e.

# delete all rows with column 'Age' has value 30 to 40 
indexNames = dfObj[ (dfObj['Age'] >= 30) & (dfObj['Age'] <= 40) ].index
dfObj.drop(indexNames , inplace=True)

Contents of modified dataframe object dfObj will be,

Rows with column ‘Age’ value 30 to 40 deleted

basically we need to use & between multiple conditions.

Delete rows based on multiple conditions on different columns

Suppose Contents of dataframe object dfObj is,

Original DataFrame pointed by dfObj

Let’s delete all rows for which column ‘Age’ has value greater than 30 and country is ‘India’

# delete all rows for which column 'Age' has value greater than 30 and Country is India 
indexNames = dfObj[ (dfObj['Age'] >= 30) & (dfObj['Country'] == 'India') ].index

dfObj.drop(indexNames , inplace=True)

Contents of modified dataframe object dfObj will be,

Rows deleted whose Age > 30 & country is India

We need to use & between multiple conditions.

Complete Example is as follows,

import pandas as pd

def main():
    
    # List of Tuples
    students = [ ('jack', 34, 'Sydeny' , 'Australia') ,
                 ('Riti', 30, 'Delhi' , 'India' ) ,
                 ('Vikas', 31, 'Mumbai' , 'India' ) ,
                 ('Neelu', 32, 'Bangalore' , 'India' ) ,
                 ('John', 16, 'New York' , 'US') ,
                 ('Mike', 17, 'las vegas' , 'US')  ]
    
    
    
    
    #Create a DataFrame object
    dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'], index=['a', 'b', 'c' , 'd' , 'e' , 'f']) 
    
    print("Original Dataframe" , dfObj, sep='\n')
    
    '''
       Delete rows by Column value
    '''
    
    print("Delete all rows for which column 'Age' has value 30")
    
   
    # Get names of indexes for which column Age has value 30
    indexNames = dfObj[ dfObj['Age'] == 30 ].index
    
    # Delete these row indexes from dataFrame
    dfObj.drop(indexNames , inplace=True)

    print("Modified  Dataframe" , dfObj, sep='\n')
    
    '''
        Delete rows based on multiple conditions on a column 
    '''
    
    print("Delete all rows for which column 'Age' has value between 30 to 40")
    
    #Create a DataFrame object
    dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'], index=['a', 'b', 'c' , 'd' , 'e' , 'f']) 
    
    print("Original Dataframe" , dfObj, sep='\n')
    
    # delete all rows for which column 'Age' has value between 30 to 40 
    indexNames = dfObj[ (dfObj['Age'] >= 30) & (dfObj['Age'] <= 40) ].index
    
    dfObj.drop(indexNames , inplace=True)
    print("Modified  Dataframe" , dfObj, sep='\n')
    
    
    '''
        Delete rows based on multiple conditions on different columns
    '''
    
    print("Delete all rows for which column 'Age' has value greater than 30  and country is 'India' ")
    
    #Create a DataFrame object
    dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'], index=['a', 'b', 'c' , 'd' , 'e' , 'f']) 
    
    print("Original Dataframe" , dfObj, sep='\n')
    
    # delete all rows for which column 'Age' has value greater than 30 and Country is India 
    indexNames = dfObj[ (dfObj['Age'] >= 30) & (dfObj['Country'] == 'India') ].index
    
    dfObj.drop(indexNames , inplace=True)
    print("Modified  Dataframe" , dfObj, sep='\n')
    
  
    
    
    
if __name__ == '__main__':
    main()

Output:

Original Dataframe
    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US
Delete all rows for which column 'Age' has value 30
Modified  Dataframe
    Name  Age       City    Country
a   jack   34     Sydeny  Australia
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US
Delete all rows for which column 'Age' has value between 30 to 40
Original Dataframe
    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US
Modified  Dataframe
   Name  Age       City Country
e  John   16   New York      US
f  Mike   17  las vegas      US
Delete all rows for which column 'Age' has value greater than 30  and country is 'India' 
Original Dataframe
    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US
Modified  Dataframe
   Name  Age       City    Country
a  jack   34     Sydeny  Australia
e  John   16   New York         US
f  Mike   17  las vegas         US

 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top