This tutorial will discuss about different ways to select rows by condition in pandas.
Table Of Contents
Preparing DataSet
First we will create a DataFrame with some hardcoded values.
import pandas as pd data = {'A': [11, 12, 13, 14, 15, 16, 17], 'B': [24, 22, 23, 24, 25, 26, 27], 'C': [31, 32, 33, 34, 35, 36, 37], 'D': ['Test1', 'Rest1', 'Nest', 'Crest', 'Temp1', 'Temp2', 'Temp3']} index=['D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7'] # Create DataFrame from dictionary df = pd.DataFrame.from_dict(data) # Set list index as Index of Dataframe df.set_index(pd.Index(index), inplace=True) print (df)
Output
A B C D D1 11 24 31 Test1 D2 12 22 32 Rest1 D3 13 23 33 Nest D4 14 24 34 Crest D5 15 25 35 Temp1 D6 16 26 36 Temp2 D7 17 27 37 Temp3
Now we will discuss several ways to select rows from this DataFrame based on conditions.
Select DataFrame Rows based on condition on a column
To select rows from a DataFrame based on conditions we are going to use the loc[]
attribute of DataFrame.
We can pass a boolean series in the loc[]
attribute in it. The length of this boolean series should be equal to the number of rows in the DataFrame. For each True
value in this boolean series, loc[]
attribute will select the corresponding row from the DataFrame.
Now, to select DataFrame rows based on a condition on any given column we need to apply that condition to every element of the column and get a boolean array or series. Then we need to pass that boolean series in the loc[]
attribute of DataFrame, and it will return us only those rows, for which the column values satisfies the given condition.
In the below example, we will select only those rows from the DataFrame for which values in column B
is greater than 23.
# Select rows, where values in column 'B' are more than 23 subDf = df.loc[df['B'] > 23] print(subDf)
Output
A B C D D1 11 24 31 Test1 D4 14 24 34 Crest D5 15 25 35 Temp1 D6 16 26 36 Temp2 D7 17 27 37 Temp3
When we applied a condition to the column B
i.e. df['B'] > 23
, and it returned as a boolean series. Each True
value in that boolean series represents that the corresponding value in the column ‘B’ satisfies the condition. Then we passed this boolean series to the loc[]
attribute and it returned us a DataFrame containing rows, for which boolean array had True values. Basically returned DataFrame has only thos rows, for which the values in given column satisfies the given condition.
Select DataFrame Rows based on conditions on multiple columns
Instead of single condition, we can apply multiple conditions on multiple columns in DataFrame, to select few rows. In the below example we will select rows from DataFrame where value in column B
is more than 23 and values in column C
is more than 35.
# Select rows where values in column 'B' are more than 23 and # values in column 'C' are more than 35 subDf = df.loc[(df['B'] > 23) & (df['C'] > 35)] print (subDf)
Output
A B C D D6 16 26 36 Temp2 D7 17 27 37 Temp3
Here we used two conditions, and we selected only those rows that satisfies both the conditions. Therefore we used the &
operator in between the two conditions.
Select rows based on conditions on string type columns
If you have columns of string type in DataFrame, and you want to select rows from the DataFrame where a particular column contains a given substring, then you can use this logic.
Here we are going to call the str
attribute on the DataFrame Column and then we can call the contains()
function on that, to check for the values which contains the given substring. It will give us a boolean array and then we will pass that to the loc[]
attribute, to select only those rows from DataFrame which contains the given substring in the specific column.
In the below example, we will select rows with column D
contains a substring temp
.
# Select DataFrame Rows, where column 'D' contains a substring 'Temp' subDf = df .loc[df['D'].str.contains('Temp') ] print (subDf)
Output
A B C D D5 15 25 35 Temp1 D6 16 26 36 Temp2 D7 17 27 37 Temp3
In the previous example, we selected rows from a DataFrame, which contains a given substring. But what if we want to select only those rows which contains particular a string (not the substring, but exact string).
In the below example, we will select DataFrame rows, where column D
has a string “temp1”.
# Select rows where column 'D' has string 'Temp1' subDf = df.loc[df['D'] == 'Temp1'] print (subDf)
Output
A B C D D5 15 25 35 Temp1
Summary
We learned about different ways to select rows from a DataFrame based on conditions on columns. Thanks.