This article will discuss different ways to Count unique values in a Dataframe Column in Python.
First of all, we will create a sample Dataframe from a list of tuples i.e.
import pandas as pd import numpy as np # List of Tuples list_of_tuples = [ (11, 34, 78, 5, 11, 56), (12, np.NaN, 98, 7, 12, 18), (13, 34, 11, 11, 56, 41) , (12, 41, 12, 41, 78, 18)] # Create a DataFrame object df = pd.DataFrame( list_of_tuples, columns=['A', 'B', 'C', 'D', 'E', 'F']) print(df)
Contents of the Dataframe are,
A B C D E F 0 11 34.0 78 5 11 56 1 12 NaN 98 7 12 18 2 13 34.0 11 11 56 41 3 12 41.0 12 41 78 18
Dataframe column ‘F’ contains four values, out of which only three are unique. Let’s see how to find that programmatically,
Count unique values in a Dataframe Column in Pandas using nunique()
We can select the dataframe column using the subscript operator with the dataframe object i.e. df[‘F’]. It will give us a Series object containing the values of that particular column. Then we can call the nunique() function on that Series object. It returns a count of total unique values in that Series. This way, we will get the total number of unique values in that column. For example,
import pandas as pd import numpy as np # List of Tuples list_of_tuples = [ (11, 34, 78, 5, 11, 56), (12, np.NaN, 98, 7, 12, 18), (13, 34, 11, 11, 56, 41) , (12, 41, 12, 41, 78, 18)] # Create a DataFrame object df = pd.DataFrame( list_of_tuples, columns=['A', 'B', 'C', 'D', 'E', 'F']) print(df) column = df['F'] count = column.nunique() print('Unique values in Column "F" : ', count)
Output:
A B C D E F 0 11 34.0 78 5 11 56 1 12 NaN 98 7 12 18 2 13 34.0 11 11 56 41 3 12 41.0 12 41 78 18 Unique values in Column "F" : 3
We fetched the column ‘F’ from Dataframe as a Series object and then counted the total unique values in that column by calling the nunique() function on the Series Object.
Include NaN while counting unique values in a Dataframe Column
By default Series.nunique() function does not includes NaN in the calculation. But if you want to include the NaN, you need to pass the dropna flag as False in the nunique() function. For example,
column = df['B'] count = column.nunique(dropna=False) print('Unique values in Column "B" including NaN : ', count)
Output:
Unique values in Column "B" including NaN : 3
Column ‘B’ has three unique values if we consider NaN too.
But if we call the nunique() function without dropna argument, it will not include the NaN by default. For example,
column = df['B'] count = column.nunique() print('Unique values in Column "B" : ', count)
Output:
Unique values in Column "B" : 2
Column ‘B’ has only two unique values if we skip NaN in the calculation.
Count unique values in a Dataframe Column using unique()
We can select the dataframe column using the subscript operator with the dataframe object i.e df[‘F’]. It will give us a Series object containing the values of that particular column. Then we can call the unique() function on that Series object. It returns a numpy array of unique values from that Series object, i.e. column ‘F’ of Datframe. If we fetch the returned numpy array’s length, it will give us the total number of unique values in that Dataframe Column. For example,
import pandas as pd import numpy as np # List of Tuples list_of_tuples = [ (11, 34, 78, 5, 11, 56), (12, np.NaN, 98, 7, 12, 18), (13, 34, 11, 11, 56, 41) , (12, 41, 12, 41, 78, 18)] # Create a DataFrame object df = pd.DataFrame( list_of_tuples, columns=['A', 'B', 'C', 'D', 'E', 'F']) print(df) column = df['F'] count = len(column.unique()) print('Unique values in Column "F": ', count)
Output:
A B C D E F 0 11 34.0 78 5 11 56 1 12 NaN 98 7 12 18 2 13 34.0 11 11 56 41 3 12 41.0 12 41 78 18 Unique values in Column "F": 3
We fetched the column ‘F’ from Dataframe as a Series object and then counted the total unique values in that column.
Count unique values in a Dataframe Column using value_counts()
We can select the dataframe column using the subscript operator with the dataframe object i.e df[‘F’]. It will give us a Series object containing the values of that particular column. Then we can call the value_counts() function on that Series object. It will provide us with another Series, which contains the frequency of each value from the calling Series object. This way, we will know the occurrence count of each value in the selected Dataframe column. Then we can count the elements whose frequency is only 1. It will give us the count of unique values in the Dataframe Column.
For example,
import pandas as pd import numpy as np # List of Tuples list_of_tuples = [ (11, 34, 78, 5, 11, 56), (12, np.NaN, 98, 7, 12, 18), (13, 34, 11, 11, 56, 41) , (12, 41, 12, 41, 78, 18)] # Create a DataFrame object df = pd.DataFrame( list_of_tuples, columns=['A', 'B', 'C', 'D', 'E', 'F']) print(df) column = df['F'] unique_values = column.value_counts() count = len(unique_values) print('Unique values in Column "F": ', count)
Output:
A B C D E F 0 11 34.0 78 5 11 56 1 12 NaN 98 7 12 18 2 13 34.0 11 11 56 41 3 12 41.0 12 41 78 18 Unique values in Column "F": 3
We fetched the column ‘F’ from Dataframe as a Series object and then counted the total unique values in that column.
Read More
- Pandas.Series.unique() – Tutorial and Examples
- Pandas.Series.nunique() – Tutorial and Examples
- Pandas.Series.is_unique – Tutorial and Examples
Summary:
Today we learned how to get the count of unique values in a Dataframe Column in Pandas.
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.