Select Rows by column value in Pandas

This tutorial will discuss about different ways to select rows by column value in pandas.

Table Of Contents

Preparing DataSet

Let’s create a DataFrame with some hadcoded values.

import pandas as pd

data = {'A': [11, 12, 13, 14, 15, 16, 17],
        'B': [24, 22, 23, 24, 25, 26, 27],
        'C': [31, 32, 33, 34, 35, 33, 33],
        'D': ['Test1', 'Rest1', 'Nest', 'Crest', 'Temp1', 'Temp2', 'Temp3']}

index=['D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7']

# Create DataFrame from dictionary
df = pd.DataFrame.from_dict(data)

# Set list index as Index of Dataframe
df.set_index(pd.Index(index), inplace=True)

print (df)

Output

     A   B   C      D
D1  11  24  31  Test1
D2  12  22  32  Rest1
D3  13  23  33   Nest
D4  14  24  34  Crest
D5  15  25  35  Temp1
D6  16  26  33  Temp2
D7  17  27  33  Temp3

Now we will select rows from this DataFrame by column values.

Select DataFrame rows where column value is equal to the given value

Suppose we have a DataFrame and we want to select only those rows from this DataFrame, in which a specific column contains a given value.

For that we are going to use the loc[] attribute of the DataFrame and we will pass a boolean series in this loc[] attribute. A True value in these boolean series represents that it need to select that particular row from the DataFrame.

To get this boolean series, we will select the specified column and apply a condition on it. It will return as a boolean series, where each True value represent the fact, that the corresponding value in that column satisfies the condition. Then we will pass that boolean series to the loc[] attribute and it will return a DataFrame containing those rows where specified column has the given value.

Like in the below example, we are going to select only those rows from the DataFrame where column C has value 33.

# Select rows where column 'C' has value 33
subDf = df.loc[df['C'] == 33]

print (subDf)

Output

     A   B   C      D
D3  13  23  33   Nest
D6  16  26  33  Temp2
D7  17  27  33  Temp3

Select DataFrame rows where column value is in a given list

We can use the isin() method of pandas Series to select only those DataFrame rows, where a specified column contains any value from a given list.

In the below example, we will select only those rows from the DataFrame where column a contains any value from the given list. For that, we will select the specified column from the DataFrame, and call the isin() function on that column, and pass the given list as argument in it. It will return as a boolean array, where each True value represents that the corresponding column value contains ant value from the given list.

Then we will pass this boolean array to the loc[] attribute and it will return us the rows where column B Contains a value from the given list.

# Select rows where column 'A' has any value from the given list
subDf = df.loc[df['A'].isin([12, 17, 15])]

print (subDf)

Output

     A   B   C      D
D2  12  22  32  Rest1
D5  15  25  35  Temp1
D7  17  27  33  Temp3

Select rows where column values are in a given range

We can also select specific rows from DataFrame where a column contains a value in a given range. For that we need to apply multiple conditions on the column and get a boolean array and then we need to pass this boolean array to the loc[] attribute of DataFrame.

Like in the below example, we are going to select only those rows from the data from where column C has a value greater than 33 but less than 35.

# Select rows where column 'C' has value
# greater than 33 and less than 35
subDf = df.loc[(df['C'] > 32) & (df['C'] < 35)]

print(subDf)

Output

     A   B   C      D
D3  13  23  33   Nest
D4  14  24  34  Crest
D6  16  26  33  Temp2
D7  17  27  33  Temp3

Select rows where column values satisfies a condition

We can also use the loc[] attribute to select only those rows from the DataFrame where values in a column satisfies a given condition.

For that we need to select that particular column and apply the given condition on that column. It will return as a boolean array, that we will pass in the loc[] attribute and it will return us only those rows for which the values in column satisfies the condition.

In the below example, we are going to select only those rows from the DataFrame where column B contains the even values.

# Select rows where values in column 'B' are even
subDf = df.loc[df['B'] % 2 == 0]

print (subDf)

Output

     A   B   C      D
D1  11  24  31  Test1
D2  12  22  32  Rest1
D4  14  24  34  Crest
D6  16  26  33  Temp2

Summary

We learned about different ways to select DataFrame rows by column value in Pandas. Thanks.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top