Count Frequency of a value in a DataFrame Column

In general data analysis, calculating the frequency of a value in a DataFrame column is important to understand the data distribution. In this tutorial, we will look at multiple ways to count the frequency of a value.

Table of Contents

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Scientist', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Scientist', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

Contents of the created dataframe are,

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti    Data Scientist     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi    Data Scientist  New York          11

Now, let’s look at different ways in which we could count the frequency of a value in a DataFrame column.

Advertisements

Count Frequency of values in DataFrame Column using value_counts() function

This is certainly the most common and easiest way to get the frequency of a value from a DataFrame column. Note that the Series.value_counts() works with only one column at a time. Let’s try to get the count of the frequency of values for the “Designation” column.

# count frequency of values for Designation
print (df['Designation'].value_counts())

Output

Data Scientist      3
Graphic Designer    1
Program Manager     1
Name: Designation, dtype: int64

The output returned contains the column unique values and their corresponding counts in the selected column. Here, we can see that the value “Data Scientist” is present in 3 different rows, and “Graphic Designer” and “Program Manager” are present in 1 row each.

If you want the frequency of a single value, then we can select that value from the object returned by value_count(). For example let’s get the frequency of ‘Data Scientist’ in column ‘Designation’. For example,

# count frequency of Designation 'Data Scientist'
print (df['Designation'].value_counts()['Data Scientist'])

Output:

3

It returned the frequency of element ‘Data Scientist’ in column ‘Designation’ of the DataFrame.

In case, we want to get the percentages, i.e., count the frequency of value by the total rows, we can simply do this using the normalize attribute in value_counts().

# count frequency of values for Designation
print (df['Designation'].value_counts(normalize=True))

Output

Data Scientist      0.6
Graphic Designer    0.2
Program Manager     0.2
Name: Designation, dtype: float64

Here, the values are converted to percentages of the total rows in the DataFrame. It comes in handy whenever we want to understand the distribution of column.

Count Frequency of values in DataFrame Column using groupby() and count() method

Another convenient method is to utilize the groupby() and count() functionality. This method is quite similar to the way we count the frequency of value in SQL. Let’s try to analyze the same “Designation” column using groupby and count functions.

# get frequency using groupby
print (df.groupby(['Designation'])['Name'].count())

Output

Data Scientist      3
Graphic Designer    1
Program Manager     1
Name: Designation, dtype: int64

The output here is the same as the above value_counts() output. Here, we have used the “Name” column as a proxy, but feel free to use any other column (containing non-missing values) and the output will remain the same.

Note that in case of column used for aggregation (“Name” column here) contains any missing value, the count function will skip that row.

Count Frequency of values in DataFrame Column using groupby() and size() method

An alternate groupby() method to avoid the above challenge of missing values is to directly use the size() function. This doesn’t depend upon any other column as shown below.

# using groupby and size method
print (df.groupby(['Designation']).size())

Output

Data Scientist      3
Graphic Designer    1
Program Manager     1
Name: Designation, dtype: int64

The output again is similar to the above methods.

Count Frequency of values in DataFrame Column using collections.counter() function

The Collections library is another hidden gem that contains a lot of functionalities. Let’s look at one of the functions “Counter” which could help count the frequency of a value in a pandas DataFrame. Let’s look at the code below.

# import collections
from collections import Counter

# using Counter and converting to the dictionary for further use
print (dict(Counter(df['Designation'])))

Output

{'Data Scientist': 3, 'Program Manager': 1, 'Graphic Designer': 1}

Here you go, we have the frequency count of value stored in a dictionary which can easily be used further for any analysis.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Scientist', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Scientist', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

print('**** Using value_counts() function ****')

# count frequency of values for Designation
print (df['Designation'].value_counts())

# count frequency of Designation 'Data Scientist'
print (df['Designation'].value_counts()['Data Scientist'])

# count frequency of values for Designation
print (df['Designation'].value_counts(normalize=True))

print('*** Using groupby() and count() method ***')

# get frequency using groupby
print (df.groupby(['Designation'])['Name'].count())

print('*** Using groupby() and size() method ***')

# using groupby and size method
print (df.groupby(['Designation']).size())

print('*** Using collections.counter() function ***')

# import collections
from collections import Counter

# using Counter and converting to the dictionary for further use
print (dict(Counter(df['Designation'])))

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti    Data Scientist     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi    Data Scientist  New York          11

**** Using value_counts() function ****

Data Scientist      3
Program Manager     1
Graphic Designer    1
Name: Designation, dtype: int64

3

Data Scientist      0.6
Program Manager     0.2
Graphic Designer    0.2
Name: Designation, dtype: float64

*** Using groupby() and count() method ***

Designation
Data Scientist      3
Graphic Designer    1
Program Manager     1
Name: Name, dtype: int64

*** Using groupby() and size() method ***

Designation
Data Scientist      3
Graphic Designer    1
Program Manager     1
dtype: int64

*** Using collections.counter() function ***

{'Data Scientist': 3, 'Program Manager': 1, 'Graphic Designer': 1}

Summary

Great, you made it! In this article, we have discussed multiple ways to count the frequency of a value in a pandas DataFrame. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top