This tutorial will discuss about different ways to select DataFrame rows where two columns are equal in pandas.
Table Of Contents
Preparing DataSet
Let’s create a DataFrame with some hardcoded data.
import pandas as pd data = {'Col_A': [33, 12, 33, 14, 35, 36, 17], 'Col_B': [21, 22, 23, 24, 25, 26, 27], 'Col_C': [33, 32, 33, 35, 35, 36, 35]} index = ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7'] # Create a DataFrame from a dictionary df = pd.DataFrame.from_dict(data) # Set the index list as Index of DataFrame df.set_index(pd.Index(index), inplace=True) print (df)
Output
Col_A Col_B Col_C X1 33 21 33 X2 12 22 32 X3 33 23 33 X4 14 24 35 X5 35 25 35 X6 36 26 36 X7 17 27 35
We will now select rows from this DataFrame where values in two specified columns are equal.
Method 1: Using DataFrame.query()
We can use the query()
method of DataFrame to select only those rows from DataFrame where values in two specific columns are equal. For example, we want to select only those rows from the DataFrame where column Col_A
and column Col_C
has equal values.
For that, we will apply a condition on both the columns, and pass that to the query()
method. It will return a DataFrame containing only those rows where values in column Col_A
and column Col_C
are equal.
Frequently Asked:
# Select rows where column Col_A and 'Col_C' # has equal values subDf = df.query('Col_A == Col_C') print (subDf)
Output
Col_A Col_B Col_C X1 33 21 33 X3 33 23 33 X5 35 25 35 X6 36 26 36
Method 2: Using DataFrame.loc[] Attribute
You can also use the loc[]
attribute of DataFrame, to select rows from a DataFrame where two given columns has equal values.
Like in the below example, we are going to select rows that has equal values in Columns ‘Col_A’ and ‘Col_C’. For that, we will apply a condition on both the columns, and it will return as a boolean Series, where each True
value represent that that corresponding row has equal values in both the columns i.e. ‘Col_A’ and ‘Col_C’.
Then we will pass that boolean series in the loc[]
attribute and it will return a DataFrame containing the rows where both the given columns has equal values in original DataFrame.
Like in the below example, we are going to select only those rows from DataFrame where column Col_A
and column Col_C
has equal values.
# Select rows where column 'Col_A' and 'Col_C' # has equal values subDf = df.loc[df['Col_A'] == df['Col_C']] print (subDf)
Output
Col_A Col_B Col_C X1 33 21 33 X3 33 23 33 X5 35 25 35 X6 36 26 36
Summary
We learned to select DataFrame rows where tow columns has unequal values. Thanks.