This tutorial will discuss about different ways to select DataFrame rows where two columns are not equal in pandas.
Table Of Contents
Preparing DataSet
Let’s create a DataFrame with some hardcoded data.
import pandas as pd data = {'Col_A': [33, 12, 33, 14, 35, 36, 17], 'Col_B': [21, 22, 23, 24, 25, 26, 27], 'Col_C': [33, 32, 33, 35, 35, 36, 35]} index = ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7'] # Create a DataFrame from a dictionary df = pd.DataFrame.from_dict(data) # Set the index list as Index of DataFrame df.set_index(pd.Index(index), inplace=True) print (df)
Output
Col_A Col_B Col_C X1 33 21 33 X2 12 22 32 X3 33 23 33 X4 14 24 35 X5 35 25 35 X6 36 26 36 X7 17 27 35
We will now select rows from this DataFrame where values in two specified columns are not equal.
Method 1: Using DataFrame.query()
We can use the query()
method of DataFrame to select only those rows from DataFrame where values in two specific columns are not equal. For example, we want to select only those rows from the DataFrame where column Col_A
and column Col_C
has different values.
For that, we will apply a condition on both the columns, and pass that to the query()
method. It will return a DataFrame containing only those rows where values in column Col_A
and column Col_C
are different.
Frequently Asked:
- Drop Duplicate Rows from Pandas Dataframe
- Get First value of a Column in Pandas DataFrame
- Find Rows with NaN in Pandas
- Select Rows by Index Names in Pandas
# Select rows where column Col_A and 'Col_C' # has different values subDf = df.query('Col_A != Col_C') print (subDf)
Output
Col_A Col_B Col_C X2 12 22 32 X4 14 24 35 X7 17 27 35
Method 2: Using DataFrame.loc[] Attribute
You can also use the loc[]
attribute of DataFrame, to select rows from a DataFrame where two given columns has different values.
Like in the below example, we are going to select rows that has different values in Columns ‘Col_A’ and ‘Col_C’. For that, we will apply a condition on both the columns, and it will return as a boolean Series, where each True
value represent that that corresponding row has different values in column ‘Col_A’ and ‘Col_C’.
Then we will pass that boolean series in the loc[]
attribute and it will return a DataFrame containing the rows where both the given columns has different values in original DataFrame.
Like in the below example, we are going to select only those rows from DataFrame where column Col_A
and column Col_C
has different values.
# Select rows where column 'Col_A' and 'Col_C' # has different values subDf = df.loc[df['Col_A'] != df['Col_C']] print (subDf)
Output
Col_A Col_B Col_C X2 12 22 32 X4 14 24 35 X7 17 27 35
Summary
We learned to select DataFrame rows where tow columns has unequal values. Thanks.