Pandas Tutorial Part #9 – Filter DataFrame Rows

This tutorial will explain how to select rows from a DataFrame based on conditions.

Table of Contents

Select DataFrame rows based on conditions

We can select only those rows from a DataFrame that satisfies a condition. For example, suppose we have DataFrame like this,

    Name Product  Sale
0   Mark  Apples    44
1   Aadi  Mangos    31
2  Shaun  Grapes    30
3   Simi  Apples    32
4   Luka  Mangos    43
5   Mike  Apples    45
6   Arun  Mangos    35
7   Riti  Grapes    37

Now we want to select only those rows in this DataFrame, where column ‘Product’ has value ‘Apples’, like this,

   Name Product  Sale
0  Mark  Apples    44
3  Simi  Apples    32
5  Mike  Apples    45

Let’s see how to do that. First of all we wll create a DataFrame,

Advertisements
import pandas as pd

# List of Tuples
students = [('Mark',  'Apples', 44),
            ('Aadi',  'Mangos', 31),
            ('Shaun', 'Grapes', 30),
            ('Simi',  'Apples', 32),
            ('Luka',  'Mangos', 43),
            ('Mike',  'Apples', 45),
            ('Arun',  'Mangos', 35),
            ('Riti',  'Grapes', 37),]

# Create a DataFrame object
df = pd.DataFrame(  students,
                    columns = ['Name' , 'Product', 'Sale']) 

# Display the DataFrame
print(df)

Output

    Name Product  Sale
0   Mark  Apples    44
1   Aadi  Mangos    31
2  Shaun  Grapes    30
3   Simi  Apples    32
4   Luka  Mangos    43
5   Mike  Apples    45
6   Arun  Mangos    35
7   Riti  Grapes    37

Now select the column ‘Product’ from this DataFrame and apply a condition to it i.e.

boolSeries = df['Product'] == 'Apples'

# Boolean Series
print(boolSeries)

Output

0     True
1    False
2    False
3     True
4    False
5     True
6    False
7    False
Name: Product, dtype: bool

It will return a boolean Series, where each True value indicates the value ‘Apples’ at the corresponding index position in the column. So, basically this Series contains True values for the rows where our condition results in True. Now, if we pass this boolean Series to the subscript operator of DataFrame, then it will select only those rows from the DataFrame for which value in the bool Series is True. For example,

# Select only those rows where,
# column 'Product' has value 'Apples'
df = df[df['Product'] == 'Apples']

# Display the DataFrame
print(df)

Output

   Name Product  Sale
0  Mark  Apples    44
3  Simi  Apples    32
5  Mike  Apples    45

It selected only those rows from the DataFrame where the condition is satisfied i.e. only those rows where column ‘Product’ contains the value ‘Apples’.

Select DataFrame rows based on multiple conditions

Just like in the above solution, we can also apply multiple conditions to filter the contents of the Dataframe. For example, let’s see how to select only those rows from the DataFrame where sales are greater than 30 but less than 40,

# Select only those rows where sale
# value is between 30 and 40
df = df[(df['Sale'] > 30) & (df['Sale'] < 40)]

# Display the DataFrame
print(df)

Output

   Name Product  Sale
1  Aadi  Mangos    31
3  Simi  Apples    32
6  Arun  Mangos    35
7  Riti  Grapes    37

It returned only those rows from DataFrame, where the sale value is between 30 and 40.

How did it work?

  • df[‘Sale’] > 30 gave a Boolean Series, which contains the True where values are greater than 30 only
  • df[‘Sale’] < 40 gave a Boolean Series, which includes the True where values are less than 40.

Then we applied the boolean & operator on these two boolean Series. It will select True values only at those indices where both the conditions are True. Then we passed that final boolean Series to the [] operator of DataFrame. It returned only those rows from the DataFrame for which value in the final Bool series was True.

Summary

We learned about different ways to select elements from DataFrame based on conditions.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top