Pandas – Select Rows where a Column contains a String

This tutorial will discuss about different ways to select DataFrame rows where a column contains a string.

Table Of Contents

Preparing DataSet

Let’s create a DataFrame with some hardcoded data.

import pandas as pd

data = {'Col_A': [11, 12, 13, 14, 15, 16, 17],
        'Col_B': [24, 22, 23, 24, 25, 26, 27],
        'Col_C': [31, 32, 33, 34, 35, 33, 33],
        'Col_D': ['Test1', 'Rest1', 'Nest', 'This', 'Temp1', 'Temp2', 'Temp3']}

index=['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7']

# Create DataFrame from dictionary
df = pd.DataFrame.from_dict(data)

# Set list index as Index of Dataframe
df.set_index(pd.Index(index), inplace=True)

print (df)

Output

    Col_A  Col_B  Col_C  Col_D
X1     11     24     31  Test1
X2     12     22     32  Rest1
X3     13     23     33   Nest
X4     14     24     34   This
X5     15     25     35  Temp1
X6     16     26     33  Temp2
X7     17     27     33  Temp3

We will now select rows from this DataFrame where a specific Column contains a String.

Select DataFrame Rows where a Column contains a string

Many times we want to select only those rows from a DataFrame where a specified column contains a given string. For that, we can use the loc[] attribute of the DataFrame. First we will select the specified column, then we will call the str attribute on it, and then we will call the contains() method on that column, with the given string value. It will return us a boolean series, where each True value represent that the corresponding column value contains the given string.

Then we will pass this boolean series into the loc[] attribute DataFrame, and it will returns a DataFrame containing only those rows where the given boolean series had True values i.e. the rows ehere specified column contains the given string.

In the below example, we are going to select only those rows from DataFrame where column Col_D contains the string “es”.

value = "es"

# Select rows where column "Col_D" contains a string "is"
subDf = df.loc[df['Col_D'].str.contains(value) ]

print(subDf)

Output

    Col_A  Col_B  Col_C  Col_D
X1     11     24     31  Test1
X2     12     22     32  Rest1
X3     13     23     33   Nest

Select DataFrame Rows based on Regex matches in a Column

In the previous example, we selected only those rows from a DataFrame, where values in a specified column contains the given substring. What if we want to select only those rows where values of a specific column matches the given regex?

Like in the below example, we are going to select only those rows from the DataFrame where column Col_D contains either “is” or “est”.

# Select rows where column "Col_D" contains either "is" or "est"
subDf = df.loc[df['Col_D'].str.contains('is|est', regex=True) ]

print (subDf)

Output

    Col_A  Col_B  Col_C  Col_D
X1     11     24     31  Test1
X2     12     22     32  Rest1
X3     13     23     33   Nest
X4     14     24     34   This

Select DataFrame Rows where a Column has a string value

In the previous example we selected only those rows where a column contains single or multiple strings. But what if you want to select only those rows where values in a specific column contains a specified string only.

Like the below example, we are going to select only those rows from the DataFrame where column Col_D has the string value “This”.

# Select rows where string value in column "Col_D" is -->"This"
subDf = df.loc[df['Col_D'] == "This"]

print(subDf)

Output

    Col_A  Col_B  Col_C Col_D
X4     14     24     34  This

Summary

We learned how to select DataFrame Rows where values in a column contains a given String. Thanks.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top