Pandas : Get frequency of a value in dataframe column/index & find its positions in Python

In this article we will discuss how to get the frequency count of unique values in a dataframe column or in dataframe index. Also either count values by grouping them in to categories / range or get percentages instead of exact counts.

Suppose we have a Dataframe i.e.

# List of Tuples
empoyees = [('jack', 34, 'Sydney', 5) ,
          ('Riti', 31, 'Delhi' , 7) ,
          ('Aadi', 16, np.NaN, 11) ,
          ('Aadi', 31,'Delhi' , 7) ,
          ('Veena', np.NaN, 'Delhi' , 4) ,
          ('Shaunak', 35, 'Mumbai', 5 ),
          ('Shaunak', 35, 'Colombo', 11)
           ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'])
# set column 'Name' as Index of the dataframe
empDfObj = empDfObj.set_index('Name')

print(empDfObj)

Contents of the dataframe empDfObj are,

          Age     City  Experience
Name                              
jack     34.0   Sydney           5
Riti     31.0    Delhi           7
Aadi     16.0      NaN          11
Aadi     31.0    Delhi           7
Veena     NaN    Delhi           4
Shaunak  35.0   Mumbai           5
Shaunak  35.0  Colombo          11

Frequency count of elements in the column ‘Age’ is,

35.0    2
31.0    2
16.0    1
34.0    1

Now to get the frequency count of elements in index or column like above, we are going to use a function provided by Series i.e.

pandas.Series.value_counts

Series.value_counts(self, normalize=False, sort=True, ascending=False, bins=None, dropna=True)

Arguments :

  • normalize : boolean, default False
    • If True it will return relative frequencies
  • sort : boolean, default True
    • Sort by frequency Count.
  • ascending : boolean, default False
    • Sort by frequency Count in ascending order if True

It returns a Series object containing the frequency count of unique elements in the series.
We can select the dataframe index or any column as a Series. Then using Series.value_counts() we can find the frequency count of elements inside it. Let’s see some examples,

Contents of the dataframe empDfObj are,

          Age     City  Experience
Name                              
jack     34.0   Sydney           5
Riti     31.0    Delhi           7
Aadi     16.0      NaN          11
Aadi     31.0    Delhi           7
Veena     NaN    Delhi           4
Shaunak  35.0   Mumbai           5
Shaunak  35.0  Colombo          11

Get Frequency count of values in a Dataframe Column

We can select a column in dataframe as series object using [] operator. On calling value_counts() on this Series object, it returns an another Series object that contains the frequency counts of unique value in the calling series i.e. our selected column.
Let’s get the frequency count of unique values in column ‘Age’ of the dataframe empDfObj,

# Get frequency count of values in column 'Age'
frequency = empDfObj['Age'].value_counts()

print("Frequency of value in column 'Age' :")
print(frequency)

Output

Frequency of value in column 'Age' :
35.0    2
31.0    2
16.0    1
34.0    1
Name: Age, dtype: int64

Get Frequency Count of an element in Dataframe index

On similar lines, we can select a Dataframe index using Datframe.index as Series object. Then by calling value_counts() on this Series object, we can get the frequency count of values in the dataframe index i.e.
Let’s fetch the frequency count of unique value in index of dataframe empDfObj,

# Get frequency count of values in Dataframe Index
frequency = empDfObj.index.value_counts()

print("Frequency of value in Index of Dataframe :")
print(frequency)

Output

Frequency of value in Index of Dataframe :
Aadi       2
Shaunak    2
Veena      1
Riti       1
jack       1
Name: Name, dtype: int64

Get Frequency count of values in a Dataframe Column including NaN

By default value_counts() skips the NaN in series while counting for frequency of unique elements. If we pass the dropna argument as False then it will include NaN too.
For example,

# Get frequency count of values including NaN in column 'Age'
frequency = empDfObj['Age'].value_counts(dropna=False)

print("Frequency of value in column 'Age' including NaN :")
print(frequency)

Output

Frequency of value in column 'Age' including NaN :
 35.0    2
 31.0    2
NaN      1
 16.0    1
 34.0    1
Name: Age, dtype: int64

Get Frequency of values as percentage in a Dataframe Column

Instead of getting the exact frequency count of elements in a dataframe column, we can normalize it too and get the relative value on the scale of 0 to 1 by passing argument normalize argument as True. Let’s get the frequency of values in the column ‘City‘ as percentage i.e.

# Get frequency percentage by values in column 'City'
frequency = empDfObj['City'].value_counts(normalize =True)

print("Frequency of values as percentage in column 'City' :")
print(frequency * 100)

Output

Frequency of values as percentage in column 'City' :
Delhi      50.000000
Mumbai     16.666667
Sydney     16.666667
Colombo    16.666667
Name: City, dtype: float64

Count of column values in grouped categories

Instead of getting exact frequency count or percentage we can group the values in a column and get the count of values in those groups.
Let’s group the values inside column Experience and get the count of employees in different experience level (range) i.e.

# Group values in a column to categories
frequency = empDfObj['Experience'].value_counts(bins=3)

print('Count of values in grouped categories of column Experience ')
print(frequency)

Output

Count of values in grouped categories of column Experience 
(3.992, 6.333]    3
(8.667, 11.0]     2
(6.333, 8.667]    2
Name: Experience, dtype: int64

So, basically distributed the values of column ‘Experience’ in 3 different categories / range and returns the count of elements in that range.

Complete example is as follows,

import pandas as pd
import numpy as np

def main():

    # List of Tuples
    empoyees = [('jack', 34, 'Sydney', 5) ,
              ('Riti', 31, 'Delhi' , 7) ,
              ('Aadi', 16, np.NaN, 11) ,
              ('Aadi', 31,'Delhi' , 7) ,
              ('Veena', np.NaN, 'Delhi' , 4) ,
              ('Shaunak', 35, 'Mumbai', 5 ),
              ('Shaunak', 35, 'Colombo', 11)
               ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'])
    # set column 'Name' as Index of the dataframe
    empDfObj = empDfObj.set_index('Name')

    print('Original Dataframe : ')
    print(empDfObj)

    print("*** Get Frequency count of values in a Dataframe Column ***")

    # Get frequency count of values in column 'Age'
    frequency = empDfObj['Age'].value_counts()

    print("Frequency of value in column 'Age' :")
    print(frequency)

    print("*** Get Frequency count of values in a Dataframe Index ***")

    # Get frequency count of values in Dataframe Index
    frequency = empDfObj.index.value_counts()

    print("Frequency of value in Index of Dataframe :")
    print(frequency)

    print('**** Get Frequency Count of an element in Dataframe index ****')

    # First check if element exists in the dataframe index
    if 'Riti' in empDfObj.index:
        # Get Frequency Count of an element in DataFrame index
        result = empDfObj.index.value_counts()['Riti']
        print('Frequency of "Riti" in Dataframe index is : ' , result)

    print("*** Get Frequency count of values in a Dataframe Column including NaN ***")

    # Get frequency count of values including NaN in column 'Age'
    frequency = empDfObj['Age'].value_counts(dropna=False)

    print("Frequency of value in column 'Age' including NaN :")
    print(frequency)

    print("*** Get Frequency of values as percentage in a Dataframe Column ***")

    # Get frequency percentage by values in column 'City'
    frequency = empDfObj['City'].value_counts(normalize =True)

    print("Frequency of values as percentage in column 'City' :")
    print(frequency * 100)


    print("*** Count of column values in grouped categories ***")

    # Group values in a column to categories
    frequency = empDfObj['Experience'].value_counts(bins=3)

    print('Count of values in grouped categories of column Experience ')
    print(frequency)

if __name__ == '__main__':
    main()

Output:

Original Dataframe : 
          Age     City  Experience
Name                              
jack     34.0   Sydney           5
Riti     31.0    Delhi           7
Aadi     16.0      NaN          11
Aadi     31.0    Delhi           7
Veena     NaN    Delhi           4
Shaunak  35.0   Mumbai           5
Shaunak  35.0  Colombo          11
*** Get Frequency count of values in a Dataframe Column ***
Frequency of value in column 'Age' :
35.0    2
31.0    2
16.0    1
34.0    1
Name: Age, dtype: int64
*** Get Frequency count of values in a Dataframe Index ***
Frequency of value in Index of Dataframe :
Aadi       2
Shaunak    2
Riti       1
Veena      1
jack       1
Name: Name, dtype: int64
**** Get Frequency Count of an element in Dataframe index ****
Frequency of "Riti" in Dataframe index is :  1
*** Get Frequency count of values in a Dataframe Column including NaN ***
Frequency of value in column 'Age' including NaN :
 35.0    2
 31.0    2
NaN      1
 16.0    1
 34.0    1
Name: Age, dtype: int64
*** Get Frequency of values as percentage in a Dataframe Column ***
Frequency of values as percentage in column 'City' :
Delhi      50.000000
Sydney     16.666667
Mumbai     16.666667
Colombo    16.666667
Name: City, dtype: float64
*** Count of column values in grouped categories ***
Count of values in grouped categories of column Experience 
(3.992, 6.333]    3
(8.667, 11.0]     2
(6.333, 8.667]    2
Name: Experience, dtype: int64

 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top