This tutorial will discuss about different ways to select DataFrame rows where column values starts with a string in Pandas.
Table Of Contents
Preparing DataSet
Let’s create a DataFrame with some hardcoded data.
import pandas as pd data = {'Col_A': ["11", "12", "13", "14", "15", "16", "17"], 'Col_B': ["What", "This", "Hit", "His", "Cube", "Why", "Hill"], 'Col_C': ["33", "32", "33", "35", "35", "36", "35"]} index=["X1", "X2", "X3", "X4", "X5", "X6", "X7"] # Create a DataFrame from a dictionary df = pd.DataFrame.from_dict(data) # Set the index list as Index of DataFrame df.set_index(pd.Index(index), inplace=True) print(df)
Output
Col_A Col_B Col_C X1 11 What 33 X2 12 This 32 X3 13 Hit 33 X4 14 His 35 X5 15 Cube 35 X6 16 Why 36 X7 17 Hill 35
We will now select rows from this DataFrame where values in a specified column starts with a string.
Select DataFrame Rows where a column values starts with a substring
We can use the loc[] attribute of the DataFrame, to select only those rows from a DataFrame where values in a specified column starts with a given substring.
For that, first select the particular column as a Pandas Series
object, and then call the startswith()
on that particular column, to check if any string value in that particular column starts with the given substring or not. It will return a boolean series, and every true value in that series represent that the particular column value starts with the given substring. Then pass this boolean series into loc[] attribute of DataFrame, and it will return a DataFrame containing only those rows for which the values in the specified column starts with the given substring.
In the below example, we are going to select rows from a DataFrame where values in column Col_B
starts with a string “Hi”.
# A String by which column values should start subStr = "Hi" # Select rows where values in column "COL_B" # starts with string "Hi" subDf = df.loc[df['Col_B'].str.startswith(subStr)] print (subDf)
Output
Col_A Col_B Col_C X3 13 Hit 33 X4 14 His 35 X7 17 Hill 35
Summary
We learned how to select DataFrame Rows where values in a specific column starts with a string. Thanks.