Pandas : Get unique values in columns of a Dataframe in Python

In this article we will discuss how to find unique elements in a single, multiple or each column of a dataframe.

Series.unique()

It returns the a numpy array of unique elements in series object.

Series.unique(self)

Series.nunique()

Series.nunique(self, dropna=True)

It returns the count of unique elements in the series object.

DataFrame.nunique(self, axis=0, dropna=True)

It returns the count of unique elements along different axis.

  • If axis = 0 : It returns a series object containing the count of unique elements in each column.
  • If axis = 1 : It returns a series object containing the count of unique elements in each row.
  • Default value of axis is 0.

Now let’s use these functions to find unique element related information from a dataframe.

First of all, create a dataframe,

# List of Tuples
empoyees = [('jack', 34, 'Sydney', 5) ,
         ('Riti', 31, 'Delhi' , 7) ,
         ('Aadi', 16, np.NaN, 11) ,
         ('Mohit', 31,'Delhi' , 7) ,
         ('Veena', np.NaN, 'Delhi' , 4) ,
         ('Shaunak', 35, 'Mumbai', 5 ),
         ('Shaun', 35, 'Colombo', 11)
          ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

print("Contents of the Dataframe : ")
print(empDfObj)

Contents of this dataframe are,

      Name   Age     City  Experience
a     jack  34.0   Sydney           5
b     Riti  31.0    Delhi           7
c     Aadi  16.0      NaN          11
d    Mohit  31.0    Delhi           7
e    Veena   NaN    Delhi           4
f  Shaunak  35.0   Mumbai           5
g    Shaun  35.0  Colombo          11

Now let’s see how to find the unique values in single or multiple columns of this dataframe.

Find unique values in a single column

To fetch the unique values in column ‘Age’ of the above created dataframe, we will call unique() function on the column i.e.

# Get a series of unique values in column 'Age' of the dataframe
uniqueValues = empDfObj['Age'].unique()

print('Unique elements in column "Age" ')
print(uniqueValues)

Output:

Unique elements in column "Age" 
[34. 31. 16. nan 35.]

empDfObj[‘Age’] returns a series object representing column ‘Age’ of the dataframe. Then on calling unique() function on that series object returns the unique element in that series i.e. unique elements in column ‘Age’ of the dataframe.

Count unique values in a single column

Suppose instead of getting the name of unique values in a column, if we are interested in count of unique elements in a column then we can use series.unique() function i.e.

# Count unique values in column 'Age' of the dataframe
uniqueValues = empDfObj['Age'].nunique()

print('Number of unique values in column "Age" of the dataframe : ')
print(uniqueValues)

Output:

Number of unique values in column "Age" of the dataframe : 
4

It returns the count of unique elements in column ‘Age’ of the dataframe.

Include NaN while counting the unique elements in a column

Using nunique() with default arguments doesn’t include NaN while counting the unique elements, if we want to include NaN too then we need to pass the dropna argument i.e.

# Count unique values in column 'Age' including NaN
uniqueValues = empDfObj['Age'].nunique(dropna=False)

print('Number of unique values in column "Age" including NaN')
print(uniqueValues)

Output:

Number of unique values in column "Age" including NaN
5

It returns the count of unique elements in column ‘Age’ of the dataframe including NaN.

Count unique values in each column of the dataframe

In Dataframe.nunique() default value of axis is 0 i.e. it returns the count of unique elements in each column i.e.

# Get a series object containing the count of unique elements
# in each column of dataframe
uniqueValues = empDfObj.nunique()

print('Count of unique value sin each column :')
print(uniqueValues)

Output:

Count of unique value sin each column :
Name          7
Age           4
City          4
Experience    4
dtype: int64

It didn’t included the NaN while counting because default value of argument dropna is True. To include the NaN pass the value of dropna argument as False i.e.

# Count unique elements in each column including NaN
uniqueValues = empDfObj.nunique(dropna=False)

print("Count Unique values in each column including NaN")
print(uniqueValues)

Output:

Count Unique values in each column including NaN
Name          7
Age           5
City          5
Experience    4
dtype: int64

It returns the count of unique elements in each column including NaN. Column Age & City has NaN therefore their count of unique elements increased from 4 to 5.

Get Unique values in a multiple columns

To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e.

# Get unique elements in multiple columns i.e. Name & Age
uniqueValues = (empDfObj['Name'].append(empDfObj['Age'])).unique()

print('Unique elements in column "Name" & "Age" :')
print(uniqueValues)

Output:

Unique elements in column "Name" & "Age" :
['jack' 'Riti' 'Aadi' 'Mohit' 'Veena' 'Shaunak' 'Shaun' 34.0 31.0 16.0 nan
 35.0]

It returns the count of unique elements in multiple columns.

Complete example is as follows,

import pandas as pd
import numpy as np

def main():

    # List of Tuples
    empoyees = [('jack', 34, 'Sydney', 5) ,
             ('Riti', 31, 'Delhi' , 7) ,
             ('Aadi', 16, np.NaN, 11) ,
             ('Mohit', 31,'Delhi' , 7) ,
             ('Veena', np.NaN, 'Delhi' , 4) ,
             ('Shaunak', 35, 'Mumbai', 5 ),
             ('Shaun', 35, 'Colombo', 11)
              ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

    print("Contents of the Dataframe : ")
    print(empDfObj)

    print("*** Find unique values in a single column ***")

    # Get a series of unique values in column 'Age' of the dataframe
    uniqueValues = empDfObj['Age'].unique()

    print('Unique elements in column "Age" ')
    print(uniqueValues)

    print("*** Count unique values in a single column ***")

    # Count unique values in column 'Age' of the dataframe
    uniqueValues = empDfObj['Age'].nunique()

    print('Number of unique values in column "Age" of the dataframe : ')
    print(uniqueValues)

    print("*** Count Unique values in each column including NaN ***")

    # Count unique values in column 'Age' including NaN
    uniqueValues = empDfObj['Age'].nunique(dropna=False)

    print('Number of unique values in column "Age" including NaN')
    print(uniqueValues)

    print("*** Count Unique values in each column ***")

    # Get a series object containing the count of unique elements
    # in each column of dataframe
    uniqueValues = empDfObj.nunique()

    print('Count of unique value sin each column :')
    print(uniqueValues)


    # Count unique elements in each column including NaN
    uniqueValues = empDfObj.nunique(dropna=False)

    print("Count Unique values in each column including NaN")
    print(uniqueValues)

    print("*** Get Unique values in a multiple columns  ***")

    # Get unique elements in multiple columns i.e. Name & Age
    uniqueValues = (empDfObj['Name'].append(empDfObj['Age'])).unique()

    print('Unique elements in column "Name" & "Age" :')
    print(uniqueValues)


if __name__ == '__main__':
    main()

Output

Contents of the Dataframe : 
      Name   Age     City  Experience
a     jack  34.0   Sydney           5
b     Riti  31.0    Delhi           7
c     Aadi  16.0      NaN          11
d    Mohit  31.0    Delhi           7
e    Veena   NaN    Delhi           4
f  Shaunak  35.0   Mumbai           5
g    Shaun  35.0  Colombo          11
*** Find unique values in a single column ***
Unique elements in column "Age" 
[34. 31. 16. nan 35.]
*** Count unique values in a single column ***
Number of unique values in column "Age" of the dataframe : 
4
*** Count Unique values in each column including NaN ***
Number of unique values in column "Age" including NaN
5
*** Count Unique values in each column ***
Count of unique value sin each column :
Name          7
Age           4
City          4
Experience    4
dtype: int64
Count Unique values in each column including NaN
Name          7
Age           5
City          5
Experience    4
dtype: int64
*** Get Unique values in a multiple columns  ***
Unique elements in column "Name" & "Age" :
['jack' 'Riti' 'Aadi' 'Mohit' 'Veena' 'Shaunak' 'Shaun' 34.0 31.0 16.0 nan
 35.0]

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top