This tutorial will discuss about different ways to select DataFrame rows where a column contains a string.
Table Of Contents
Preparing DataSet
Let’s create a DataFrame with some hardcoded data.
import pandas as pd data = {'Col_A': [11, 12, 13, 14, 15, 16, 17], 'Col_B': [24, 22, 23, 24, 25, 26, 27], 'Col_C': [31, 32, 33, 34, 35, 33, 33], 'Col_D': ['Test1', 'Rest1', 'Nest', 'This', 'Temp1', 'Temp2', 'Temp3']} index=['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7'] # Create DataFrame from dictionary df = pd.DataFrame.from_dict(data) # Set list index as Index of Dataframe df.set_index(pd.Index(index), inplace=True) print (df)
Output
Col_A Col_B Col_C Col_D X1 11 24 31 Test1 X2 12 22 32 Rest1 X3 13 23 33 Nest X4 14 24 34 This X5 15 25 35 Temp1 X6 16 26 33 Temp2 X7 17 27 33 Temp3
We will now select rows from this DataFrame where a specific Column contains a String.
Select DataFrame Rows where a Column contains a string
Many times we want to select only those rows from a DataFrame where a specified column contains a given string. For that, we can use the loc[]
attribute of the DataFrame. First we will select the specified column, then we will call the str attribute on it, and then we will call the contains()
method on that column, with the given string value. It will return us a boolean series, where each True
value represent that the corresponding column value contains the given string.
Then we will pass this boolean series into the loc[]
attribute DataFrame, and it will returns a DataFrame containing only those rows where the given boolean series had True values i.e. the rows ehere specified column contains the given string.
Frequently Asked:
- Pandas: Drop last N columns of dataframe
- Count Unique Values in all Columns of Pandas Dataframe
- Select Rows with unique column values in Pandas
- Pandas: Select Rows where column values ends with a string
In the below example, we are going to select only those rows from DataFrame where column Col_D
contains the string “es”.
value = "es" # Select rows where column "Col_D" contains a string "is" subDf = df.loc[df['Col_D'].str.contains(value) ] print(subDf)
Output
Col_A Col_B Col_C Col_D X1 11 24 31 Test1 X2 12 22 32 Rest1 X3 13 23 33 Nest
Select DataFrame Rows based on Regex matches in a Column
In the previous example, we selected only those rows from a DataFrame, where values in a specified column contains the given substring. What if we want to select only those rows where values of a specific column matches the given regex?
Like in the below example, we are going to select only those rows from the DataFrame where column Col_D
contains either “is” or “est”.
# Select rows where column "Col_D" contains either "is" or "est" subDf = df.loc[df['Col_D'].str.contains('is|est', regex=True) ] print (subDf)
Output
Col_A Col_B Col_C Col_D X1 11 24 31 Test1 X2 12 22 32 Rest1 X3 13 23 33 Nest X4 14 24 34 This
Select DataFrame Rows where a Column has a string value
In the previous example we selected only those rows where a column contains single or multiple strings. But what if you want to select only those rows where values in a specific column contains a specified string only.
Like the below example, we are going to select only those rows from the DataFrame where column Col_D has the string value “This”.
# Select rows where string value in column "Col_D" is -->"This" subDf = df.loc[df['Col_D'] == "This"] print(subDf)
Output
Col_A Col_B Col_C Col_D X4 14 24 34 This
Summary
We learned how to select DataFrame Rows where values in a column contains a given String. Thanks.