Pandas: Select Rows where column values starts with a string

This tutorial will discuss about different ways to select DataFrame rows where column values starts with a string in Pandas.

Table Of Contents

Preparing DataSet

Let’s create a DataFrame with some hardcoded data.

import pandas as pd

data = {'Col_A': ["11", "12", "13", "14", "15", "16", "17"],
        'Col_B': ["What", "This", "Hit", "His", "Cube", "Why", "Hill"],
        'Col_C': ["33", "32", "33", "35", "35", "36", "35"]}

index=["X1", "X2", "X3", "X4", "X5", "X6", "X7"]

# Create a DataFrame from a dictionary
df = pd.DataFrame.from_dict(data)

# Set the index list as Index of DataFrame
df.set_index(pd.Index(index), inplace=True)

print(df)

Output

   Col_A Col_B Col_C
X1    11  What    33
X2    12  This    32
X3    13   Hit    33
X4    14   His    35
X5    15  Cube    35
X6    16   Why    36
X7    17  Hill    35

We will now select rows from this DataFrame where values in a specified column starts with a string.

Select DataFrame Rows where a column values starts with a substring

We can use the loc[] attribute of the DataFrame, to select only those rows from a DataFrame where values in a specified column starts with a given substring.

For that, first select the particular column as a Pandas Series object, and then call the startswith() on that particular column, to check if any string value in that particular column starts with the given substring or not. It will return a boolean series, and every true value in that series represent that the particular column value starts with the given substring. Then pass this boolean series into loc[] attribute of DataFrame, and it will return a DataFrame containing only those rows for which the values in the specified column starts with the given substring.

In the below example, we are going to select rows from a DataFrame where values in column Col_B starts with a string “Hi”.

# A String by which column values should start
subStr = "Hi"

# Select rows where values in column "COL_B"
# starts with string "Hi"
subDf = df.loc[df['Col_B'].str.startswith(subStr)]

print (subDf)

Output

   Col_A Col_B Col_C
X3    13   Hit    33
X4    14   His    35
X7    17  Hill    35

Summary

We learned how to select DataFrame Rows where values in a specific column starts with a string. Thanks.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top