Select Rows with unique column values in Pandas

This tutorial will discuss about different ways to select DataFrame rows with unique column values in pandas.

Table Of Contents

Preparing DataSet

Let’s create a DataFrame with some dummy data.

import pandas as pd

data = {'Col_A': [11, 12, 13, 14, 15, 16, 17],
        'Col_B': [21, 22, 23, 24, 25, 26, 27],
        'Col_C': [33, 32, 33, 35, 35, 36, 35]}

index = ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7']

# Create a DataFrame from a dictionary
df = pd.DataFrame.from_dict(data)

# Set the index list as Index of DataFrame
df.set_index(pd.Index(index), inplace=True)

print (df)

Output

    Col_A  Col_B  Col_C
X1     11     21     33
X2     12     22     32
X3     13     23     33
X4     14     24     35
X5     15     25     35
X6     16     26     36
X7     17     27     35

Now we will learn how to select DataFrame rows with unique values in a specified column in Pandas.

Select rows with unique values in a column

We can use the drop_duplicates() method of DataFrame to select only those rows from the DataFrame which contain unique values in a given column. For that we will pass the column name as an argument in the drop_duplicates() method and it will return a DataFrame containing only those rows which has unique values in the given column.

In the below example, we are going to select only those rows from the DataFrame which has unique values in column Col_C.

# Select rows which has unique values in column 'Col_C'
subDf = df.drop_duplicates('Col_C')

print (subDf)

Output

    Col_A  Col_B  Col_C
X1     11     21     33
X2     12     22     32
X4     14     24     35
X6     16     26     36

Select rows with unique values in a column (keep last in duplicates)

In the previous example, the drop_duplicates() method give us only those rows which has the unique values in column Col_C, but it considered the first value out of the duplicate values. If you want to select the last value from duplicate values, then you need to pass the argument keep as last in the drop_duplicates() method.

In the below example, we will select rows from the DataFrame which has unique values in column Col_C. But for rows with the duplicate values in given column, we will select the one row with the last duplicate value. For example,

# Select rows which has unique values in column 'Col_C'
# For rows with duplicate values in column 'Col_C'
# Select row with last value
subDf = df.drop_duplicates('Col_C', keep='last')

print (subDf)

Output

    Col_A  Col_B  Col_C
X2     12     22     32
X3     13     23     33
X6     16     26     36
X7     17     27     35

Select rows with unique values in a column (keep None of the duplicates)

Suppose we want to select only those rows which contains the completely unique values in a given column. Basically we want to skip all the rows which has a duplicate values in a given column.For that, we need to pass the keep argument as False in the drop_duplicates() method.

In the below example, we are going to select rows which has unique values in column Col_C, but we will not select the single row which has any duplicate value in the column Col_C.

# Select rows which has unique values in column 'Col_C’
# Do not select rows which has any duplicate value in column 'Col_C'
subDf = df.drop_duplicates('Col_C', keep=False)

print (subDf)

Output

    Col_A  Col_B  Col_C
X2     12     22     32
X6     16     26     36

Summary

We learned how to select DataFrame rows with unique values in a specified column. Thanks.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top