Select Rows by Condition in Pandas

This tutorial will discuss about different ways to select rows by condition in pandas.

Table Of Contents

Preparing DataSet

First we will create a DataFrame with some hardcoded values.

import pandas as pd

data = {'A': [11, 12, 13, 14, 15, 16, 17],
        'B': [24, 22, 23, 24, 25, 26, 27],
        'C': [31, 32, 33, 34, 35, 36, 37],
        'D': ['Test1', 'Rest1', 'Nest', 'Crest', 'Temp1', 'Temp2', 'Temp3']}

index=['D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7']

# Create DataFrame from dictionary
df = pd.DataFrame.from_dict(data)

# Set list index as Index of Dataframe
df.set_index(pd.Index(index), inplace=True)

print (df)

Output

     A   B   C      D
D1  11  24  31  Test1
D2  12  22  32  Rest1
D3  13  23  33   Nest
D4  14  24  34  Crest
D5  15  25  35  Temp1
D6  16  26  36  Temp2
D7  17  27  37  Temp3

Now we will discuss several ways to select rows from this DataFrame based on conditions.

Select DataFrame Rows based on condition on a column

To select rows from a DataFrame based on conditions we are going to use the loc[] attribute of DataFrame.

We can pass a boolean series in the loc[] attribute in it. The length of this boolean series should be equal to the number of rows in the DataFrame. For each True value in this boolean series, loc[] attribute will select the corresponding row from the DataFrame.

Now, to select DataFrame rows based on a condition on any given column we need to apply that condition to every element of the column and get a boolean array or series. Then we need to pass that boolean series in the loc[] attribute of DataFrame, and it will return us only those rows, for which the column values satisfies the given condition.

In the below example, we will select only those rows from the DataFrame for which values in column B is greater than 23.

# Select rows, where values in column 'B' are more than 23
subDf = df.loc[df['B'] > 23]

print(subDf)

Output

     A   B   C      D
D1  11  24  31  Test1
D4  14  24  34  Crest
D5  15  25  35  Temp1
D6  16  26  36  Temp2
D7  17  27  37  Temp3

When we applied a condition to the column B i.e. df['B'] > 23, and it returned as a boolean series. Each True value in that boolean series represents that the corresponding value in the column ‘B’ satisfies the condition. Then we passed this boolean series to the loc[] attribute and it returned us a DataFrame containing rows, for which boolean array had True values. Basically returned DataFrame has only thos rows, for which the values in given column satisfies the given condition.

Select DataFrame Rows based on conditions on multiple columns

Instead of single condition, we can apply multiple conditions on multiple columns in DataFrame, to select few rows. In the below example we will select rows from DataFrame where value in column B is more than 23 and values in column C is more than 35.

# Select rows where values in column 'B' are more than 23 and
# values in column 'C' are more than 35
subDf = df.loc[(df['B'] > 23) & (df['C'] > 35)]

print (subDf)

Output

     A   B   C      D
D6  16  26  36  Temp2
D7  17  27  37  Temp3

Here we used two conditions, and we selected only those rows that satisfies both the conditions. Therefore we used the & operator in between the two conditions.

Select rows based on conditions on string type columns

If you have columns of string type in DataFrame, and you want to select rows from the DataFrame where a particular column contains a given substring, then you can use this logic.

Here we are going to call the str attribute on the DataFrame Column and then we can call the contains() function on that, to check for the values which contains the given substring. It will give us a boolean array and then we will pass that to the loc[] attribute, to select only those rows from DataFrame which contains the given substring in the specific column.

In the below example, we will select rows with column D contains a substring temp.

# Select DataFrame Rows, where column 'D' contains a substring 'Temp'
subDf = df .loc[df['D'].str.contains('Temp') ]

print (subDf)

Output

     A   B   C      D
D5  15  25  35  Temp1
D6  16  26  36  Temp2
D7  17  27  37  Temp3

In the previous example, we selected rows from a DataFrame, which contains a given substring. But what if we want to select only those rows which contains particular a string (not the substring, but exact string).

In the below example, we will select DataFrame rows, where column D has a string “temp1”.

# Select rows where column 'D' has string 'Temp1'
subDf = df.loc[df['D'] == 'Temp1']

print (subDf)

Output

     A   B   C      D
D5  15  25  35  Temp1

Summary

We learned about different ways to select rows from a DataFrame based on conditions on columns. Thanks.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top