In this article we will discuss how to find unique elements in a single, multiple or each column of a dataframe.
Series.unique()
It returns the a numpy array of unique elements in series object.
Series.unique(self)
Series.nunique()
Series.nunique(self, dropna=True)
It returns the count of unique elements in the series object.
Frequently Asked:
DataFrame.nunique(self, axis=0, dropna=True)
It returns the count of unique elements along different axis.
- If axis = 0 : It returns a series object containing the count of unique elements in each column.
- If axis = 1 : It returns a series object containing the count of unique elements in each row.
- Default value of axis is 0.
Now let’s use these functions to find unique element related information from a dataframe.
First of all, create a dataframe,
Latest Python - Video Tutorial
# List of Tuples empoyees = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) print("Contents of the Dataframe : ") print(empDfObj)
Contents of this dataframe are,
Name Age City Experience a jack 34.0 Sydney 5 b Riti 31.0 Delhi 7 c Aadi 16.0 NaN 11 d Mohit 31.0 Delhi 7 e Veena NaN Delhi 4 f Shaunak 35.0 Mumbai 5 g Shaun 35.0 Colombo 11
Now let’s see how to find the unique values in single or multiple columns of this dataframe.
Find unique values in a single column
To fetch the unique values in column ‘Age’ of the above created dataframe, we will call unique() function on the column i.e.
# Get a series of unique values in column 'Age' of the dataframe uniqueValues = empDfObj['Age'].unique() print('Unique elements in column "Age" ') print(uniqueValues)
Output:
Unique elements in column "Age" [34. 31. 16. nan 35.]
empDfObj[‘Age’] returns a series object representing column ‘Age’ of the dataframe. Then on calling unique() function on that series object returns the unique element in that series i.e. unique elements in column ‘Age’ of the dataframe.
Count unique values in a single column
Suppose instead of getting the name of unique values in a column, if we are interested in count of unique elements in a column then we can use series.unique() function i.e.
# Count unique values in column 'Age' of the dataframe uniqueValues = empDfObj['Age'].nunique() print('Number of unique values in column "Age" of the dataframe : ') print(uniqueValues)
Output:
Number of unique values in column "Age" of the dataframe : 4
It returns the count of unique elements in column ‘Age’ of the dataframe.
Include NaN while counting the unique elements in a column
Using nunique() with default arguments doesn’t include NaN while counting the unique elements, if we want to include NaN too then we need to pass the dropna argument i.e.
# Count unique values in column 'Age' including NaN uniqueValues = empDfObj['Age'].nunique(dropna=False) print('Number of unique values in column "Age" including NaN') print(uniqueValues)
Output:
Number of unique values in column "Age" including NaN 5
It returns the count of unique elements in column ‘Age’ of the dataframe including NaN.
Count unique values in each column of the dataframe
In Dataframe.nunique() default value of axis is 0 i.e. it returns the count of unique elements in each column i.e.
# Get a series object containing the count of unique elements # in each column of dataframe uniqueValues = empDfObj.nunique() print('Count of unique value sin each column :') print(uniqueValues)
Output:
Count of unique value sin each column : Name 7 Age 4 City 4 Experience 4 dtype: int64
It didn’t included the NaN while counting because default value of argument dropna is True. To include the NaN pass the value of dropna argument as False i.e.
# Count unique elements in each column including NaN uniqueValues = empDfObj.nunique(dropna=False) print("Count Unique values in each column including NaN") print(uniqueValues)
Output:
Count Unique values in each column including NaN Name 7 Age 5 City 5 Experience 4 dtype: int64
It returns the count of unique elements in each column including NaN. Column Age & City has NaN therefore their count of unique elements increased from 4 to 5.
Get Unique values in a multiple columns
To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e.
# Get unique elements in multiple columns i.e. Name & Age uniqueValues = (empDfObj['Name'].append(empDfObj['Age'])).unique() print('Unique elements in column "Name" & "Age" :') print(uniqueValues)
Output:
Unique elements in column "Name" & "Age" : ['jack' 'Riti' 'Aadi' 'Mohit' 'Veena' 'Shaunak' 'Shaun' 34.0 31.0 16.0 nan 35.0]
It returns the count of unique elements in multiple columns.
Complete example is as follows,
import pandas as pd import numpy as np def main(): # List of Tuples empoyees = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) print("Contents of the Dataframe : ") print(empDfObj) print("*** Find unique values in a single column ***") # Get a series of unique values in column 'Age' of the dataframe uniqueValues = empDfObj['Age'].unique() print('Unique elements in column "Age" ') print(uniqueValues) print("*** Count unique values in a single column ***") # Count unique values in column 'Age' of the dataframe uniqueValues = empDfObj['Age'].nunique() print('Number of unique values in column "Age" of the dataframe : ') print(uniqueValues) print("*** Count Unique values in each column including NaN ***") # Count unique values in column 'Age' including NaN uniqueValues = empDfObj['Age'].nunique(dropna=False) print('Number of unique values in column "Age" including NaN') print(uniqueValues) print("*** Count Unique values in each column ***") # Get a series object containing the count of unique elements # in each column of dataframe uniqueValues = empDfObj.nunique() print('Count of unique value sin each column :') print(uniqueValues) # Count unique elements in each column including NaN uniqueValues = empDfObj.nunique(dropna=False) print("Count Unique values in each column including NaN") print(uniqueValues) print("*** Get Unique values in a multiple columns ***") # Get unique elements in multiple columns i.e. Name & Age uniqueValues = (empDfObj['Name'].append(empDfObj['Age'])).unique() print('Unique elements in column "Name" & "Age" :') print(uniqueValues) if __name__ == '__main__': main()
Output
Contents of the Dataframe : Name Age City Experience a jack 34.0 Sydney 5 b Riti 31.0 Delhi 7 c Aadi 16.0 NaN 11 d Mohit 31.0 Delhi 7 e Veena NaN Delhi 4 f Shaunak 35.0 Mumbai 5 g Shaun 35.0 Colombo 11 *** Find unique values in a single column *** Unique elements in column "Age" [34. 31. 16. nan 35.] *** Count unique values in a single column *** Number of unique values in column "Age" of the dataframe : 4 *** Count Unique values in each column including NaN *** Number of unique values in column "Age" including NaN 5 *** Count Unique values in each column *** Count of unique value sin each column : Name 7 Age 4 City 4 Experience 4 dtype: int64 Count Unique values in each column including NaN Name 7 Age 5 City 5 Experience 4 dtype: int64 *** Get Unique values in a multiple columns *** Unique elements in column "Name" & "Age" : ['jack' 'Riti' 'Aadi' 'Mohit' 'Veena' 'Shaunak' 'Shaun' 34.0 31.0 16.0 nan 35.0]
Latest Video Tutorials