This article will discuss different ways to count unique values in all columns of a dataframe in Pandas.
First of all, we will create a sample Dataframe from a list of tuples i.e.
import pandas as pd import numpy as np # List of Tuples list_of_tuples = [ (11, 34, 67, 5, np.NaN, 34), (12, 34, np.NaN, 11, 12, np.NaN), (13, 34, 78, 11, 12, 18) , (12, 34, 80, 41, 11, 18)] # Create a DataFrame object df = pd.DataFrame( list_of_tuples, columns=['A', 'B', 'C', 'D', 'E', 'F']) print(df)
Contents of the Dataframe are,
A B C D E F 0 11 34 67.0 5 NaN 34.0 1 12 34 NaN 11 12.0 NaN 2 13 34 78.0 11 12.0 18.0 3 12 34 80.0 41 11.0 18.0
Now let’s see how we can get the count of unique values in each of the columns.
Count Unique Values in All Columns using Dataframe.nunique()
In Pandas, the Dataframe provides a member function nunique(). It gives a Series containing unique elements along the requested axis. We can use this to get a count of unique values in each of the columns. For example,
# Get a Series of count of unique values in each column unique_values = df.nunique() print(unique_values)
Output:
A 3 B 1 C 3 D 3 E 2 F 2 dtype: int64
Here we fetched the count of unique values in each of the columns of Dataframe.
By default Dataframe.nunique() doesn’t includes the NaN values. Therefore, if you want to include the NaN values while counting unique values, you need to pass the dropna argument as False to the nunique() function. For example,
# Get a Series of count of unique values in each column # including NaN unique_values = df.nunique(dropna=False) print(unique_values)
Output:
A 3 B 1 C 4 D 3 E 3 F 3 dtype: int64
This time nunique() included the NaN values to while counting the unique elements.
Count Unique Values in All Columns using For Loop
Another simple solution is that we can iterate over all the columns of a Datframe one by one. During iteration, we can count the unique values of each column. For example,
# Iterate over all column names of Dataframe for col in df.columns: # Select the column by name and get count of unique values in it count = df[col].nunique() print('Count of Unique values in Column ', col, ' is : ', count)
Output:
Count of Unique values in Column A is : 3 Count of Unique values in Column B is : 1 Count of Unique values in Column C is : 3 Count of Unique values in Column D is : 3 Count of Unique values in Column E is : 2 Count of Unique values in Column F is : 2
Using Loop, we iterated through all the column names of Dataframe. Then for each column name, we fetched the column as a Series object and then counted the unique values in that column using Series.nunique() function.
By default Series.nunique() doesn’t includes the NaN values. Therefore, if you want to include the NaN values while counting unique values, you need to pass the dropna argument as False to the Series.nunique() function. For example,
# Iterate over all column names of Dataframe and Include NaN for col in df.columns: # Select the column by name and get count of unique values in it # including NaN count = df[col].nunique(dropna=False) print('Count of Unique values in Column ', col, ' is : ', count)
Output:
Count of Unique values in Column A is : 3 Count of Unique values in Column B is : 1 Count of Unique values in Column C is : 4 Count of Unique values in Column D is : 3 Count of Unique values in Column E is : 3 Count of Unique values in Column F is : 3
In this way, if you want, you can also skip specific columns based on conditions.
Read More
- Count unique value in a single Dataframe Column
- Pandas – Series.unique() method
- Pandas – Series.nunique() method
- Pandas – Series.is_unique method
The complete example is as follows,
import pandas as pd import numpy as np # List of Tuples list_of_tuples = [ (11, 34, 67, 5, np.NaN, 34), (12, 34, np.NaN, 11, 12, np.NaN), (13, 34, 78, 11, 12, 18) , (12, 34, 80, 41, 11, 18)] # Create a DataFrame object df = pd.DataFrame( list_of_tuples, columns=['A', 'B', 'C', 'D', 'E', 'F']) print(df) # Get a Series of count of unique values in each column unique_values = df.nunique() print(unique_values) print('***********') # Get a Series of count of unique values in each column # including NaN unique_values = df.nunique(dropna=False) print(unique_values) print('***********') # Iterate over all column names of Dataframe for col in df.columns: # Select the column by name and get count of unique values in it count = df[col].nunique() print('Count of Unique values in Column ', col, ' is : ', count) print('***********') # Iterate over all column names of Dataframe and Include NaN for col in df.columns: # Select the column by name and get count of unique values in it # including NaN count = df[col].nunique(dropna=False) print('Count of Unique values in Column ', col, ' is : ', count)
Output:
A B C D E F 0 11 34 67.0 5 NaN 34.0 1 12 34 NaN 11 12.0 NaN 2 13 34 78.0 11 12.0 18.0 3 12 34 80.0 41 11.0 18.0 A 3 B 1 C 3 D 3 E 2 F 2 dtype: int64 *********** A 3 B 1 C 4 D 3 E 3 F 3 dtype: int64 *********** Count of Unique values in Column A is : 3 Count of Unique values in Column B is : 1 Count of Unique values in Column C is : 3 Count of Unique values in Column D is : 3 Count of Unique values in Column E is : 2 Count of Unique values in Column F is : 2 *********** Count of Unique values in Column A is : 3 Count of Unique values in Column B is : 1 Count of Unique values in Column C is : 4 Count of Unique values in Column D is : 3 Count of Unique values in Column E is : 3 Count of Unique values in Column F is : 3
Summary:
We learned two different ways to count unique values in all columns of the Dataframe in Pandas.
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.