Pandas – Count Values in Column greater than N

This article will discuss different ways to count values in a column, which are greater than a given limit.

Let’s first create a Dataframe from a list of tuples i.e.

import pandas as pd
import numpy as np

# List of Tuples
list_of_tuples = [  (11,  34,     78,  5,  11, 56),
                    (12,  np.NaN, 98,  7,  12, 18),
                    (21,  56,     78,  23, 64, 28) ,
                    (56,  98,     35,  63, 27, 45) ,
                    (13,  34,     11,  11, 56, 10) ,
                    (12,  41,     12,  41, 78, 18)]


# Create a DataFrame object
df = pd.DataFrame(  list_of_tuples, 
                    columns=['A', 'B', 'C', 'D', 'E', 'F'])

print(df)

Output

    A     B   C   D   E   F
0  11  34.0  78   5  11  56
1  12   NaN  98   7  12  18
2  21  56.0  78  23  64  28
3  56  98.0  35  63  27  45
4  13  34.0  11  11  56  10
5  12  41.0  12  41  78  18

This Dataframe contains six columns, and each of the columns includes some integers or NaN values. Now let’s see how we can get the count of values greater than a given value in a column.

Technique 1: Get count of column values greater than a value using Series. count()

The steps are as follows,

Advertisements
  • Use subscript operator with the Dataframe object to select the column by the column name i.e. df[column_name]
  • Then select the subset of this Series/Column containing values greater than given limit i.e. df[column_name][df[column_name] > limit]
  • Then get the size/count of that subset Series i.e. df[column_name][df[column_name] > limit].count()

Let’s see an example, where we will fetch the count of values greater than 20 in column ‘C’,

limit = 20
column_name = 'C'

# Select column 'C' from the dataframe
column = df[column_name]

# Get count of values greater than 20 in the column 'C' 
count = column[column > limit].count()

Output

Count of values greater than 20 in Column  C :  4

There were only four values greater than 20 in column ‘C’.

Technique 2: Count column values greater than a limit using Series.sum()

The steps are as follows,

  • Get a bool Series by applying a condition on the column to mark only those values which are greater than a limit i.e., df[column_name] > limit
    • This bool Series will contain True only for those values which are greater than a specific limit.
  • Call sum() function on the bool Series object. It will return the count of values that are greater than the given limit.

Let’s see an example, where we will fetch the count of values greater than 40 in column ‘B’,

limit = 40
column_name = 'B'

# Get count of values greater than 40 in the column 'B' 
count = (df[column_name] > limit).sum()

print('Count of values greater than 40 in Column B : ', count)

Output

Count of values greater than 40 in Column B :  3

There were only four values greater than 40 in column ‘B’.

Technique 3: Count column values greater than a limit using np.count_nonzero()

The steps are as follows,

  • Apply a condition on the column to mark only those values which are greater than a limit i.e., df[column_name] > limit
    • It returns a bool Series that contains True values, only for values greater than the given limit.
  • Pass this bool series to numpy.count_nonzero() function. It will return the count of True values in Series i.e. count of values greater than the given limit in the selected column.

Let’s see an example, where we will fetch the count of values greater than 15 in column ‘F’,

limit = 15
column_name = 'F'

# Get count of values greater than 15 in the column 'F' 
count = np.count_nonzero(df[column_name] > limit)

print('Count of values greater than 20 in Column F : ', count)

Output

Count of values greater than 20 in Column F :  5

There were only five values greater than 15 in column ‘F’.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
list_of_tuples = [  (11,  34,     78,  5,  11, 56),
                    (12,  np.NaN, 98,  7,  12, 18),
                    (21,  56,     78,  23, 64, 28) ,
                    (56,  98,     35,  63, 27, 45) ,
                    (13,  34,     11,  11, 56, 10) ,
                    (12,  41,     12,  41, 78, 18)]


# Create a DataFrame object
df = pd.DataFrame(  list_of_tuples, 
                    columns=['A', 'B', 'C', 'D', 'E', 'F'])

print(df)

limit = 20
column_name = 'C'

# Select column 'C' from the dataframe
column = df[column_name]

# Get count of values greater than 20 in the column 'C' 
count = column[column > limit].count()

print('Count of values greater than 20 in Column  C : ', count)


limit = 40
column_name = 'B'

# Get count of values greater than 40 in the column 'B' 
count = (df[column_name] > limit).sum()

print('Count of values greater than 40 in Column B : ', count)


## Technique 3 ##

limit = 15
column_name = 'F'

# Get count of values greater than 15 in the column 'F' 
count = np.count_nonzero(df[column_name] > limit)

print('Count of values greater than 20 in Column F : ', count)

Output

    A     B   C   D   E   F
0  11  34.0  78   5  11  56
1  12   NaN  98   7  12  18
2  21  56.0  78  23  64  28
3  56  98.0  35  63  27  45
4  13  34.0  11  11  56  10
5  12  41.0  12  41  78  18

Count of values greater than 20 in Column  C :  4
Count of values greater than 40 in Column B :  3
Count of values greater than 20 in Column F :  5

Summary

We learned about the different ways to get the count of values greater than a given value in a Pandas Dataframe Column.

Advertisements

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top