This tutorial will discuss about different ways to select DataFrame rows with unique column values in pandas.
Table Of Contents
Preparing DataSet
Let’s create a DataFrame with some dummy data.
import pandas as pd data = {'Col_A': [11, 12, 13, 14, 15, 16, 17], 'Col_B': [21, 22, 23, 24, 25, 26, 27], 'Col_C': [33, 32, 33, 35, 35, 36, 35]} index = ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7'] # Create a DataFrame from a dictionary df = pd.DataFrame.from_dict(data) # Set the index list as Index of DataFrame df.set_index(pd.Index(index), inplace=True) print (df)
Output
Col_A Col_B Col_C X1 11 21 33 X2 12 22 32 X3 13 23 33 X4 14 24 35 X5 15 25 35 X6 16 26 36 X7 17 27 35
Now we will learn how to select DataFrame rows with unique values in a specified column in Pandas.
Select rows with unique values in a column
We can use the drop_duplicates()
method of DataFrame to select only those rows from the DataFrame which contain unique values in a given column. For that we will pass the column name as an argument in the drop_duplicates()
method and it will return a DataFrame containing only those rows which has unique values in the given column.
In the below example, we are going to select only those rows from the DataFrame which has unique values in column Col_C
.
Frequently Asked:
- Add Column at a specific position in Pandas DataFrame
- Replace column values based on conditions in Pandas
- Check if all values in column are NaN in Pandas
- Convert Column Names to Uppercase in Pandas Dataframe
# Select rows which has unique values in column 'Col_C' subDf = df.drop_duplicates('Col_C') print (subDf)
Output
Col_A Col_B Col_C X1 11 21 33 X2 12 22 32 X4 14 24 35 X6 16 26 36
Select rows with unique values in a column (keep last in duplicates)
In the previous example, the drop_duplicates()
method give us only those rows which has the unique values in column Col_C
, but it considered the first value out of the duplicate values. If you want to select the last value from duplicate values, then you need to pass the argument keep
as last
in the drop_duplicates()
method.
In the below example, we will select rows from the DataFrame which has unique values in column Col_C
. But for rows with the duplicate values in given column, we will select the one row with the last duplicate value. For example,
# Select rows which has unique values in column 'Col_C' # For rows with duplicate values in column 'Col_C' # Select row with last value subDf = df.drop_duplicates('Col_C', keep='last') print (subDf)
Output
Col_A Col_B Col_C X2 12 22 32 X3 13 23 33 X6 16 26 36 X7 17 27 35
Select rows with unique values in a column (keep None of the duplicates)
Suppose we want to select only those rows which contains the completely unique values in a given column. Basically we want to skip all the rows which has a duplicate values in a given column.For that, we need to pass the keep
argument as False
in the drop_duplicates()
method.
In the below example, we are going to select rows which has unique values in column Col_C
, but we will not select the single row which has any duplicate value in the column Col_C
.
# Select rows which has unique values in column 'Col_C’ # Do not select rows which has any duplicate value in column 'Col_C' subDf = df.drop_duplicates('Col_C', keep=False) print (subDf)
Output
Col_A Col_B Col_C X2 12 22 32 X6 16 26 36
Summary
We learned how to select DataFrame rows with unique values in a specified column. Thanks.