Pandas : count rows in a dataframe | all or those only that satisfy a condition

In this article we will discuss different ways to count number of all rows in a Dataframe or rows that satisfy a condition.

Let’s create a Dataframe,

# List of Tuples
empoyees = [('jack', 34, 'Sydney', 5) ,
           ('Riti', 31, 'Delhi' , 7) ,
           ('Aadi', 16, np.NaN, 11) ,
           ('Mohit', np.NaN,'Delhi' , 15) ,
           ('Veena', 33, 'Delhi' , 4) ,
           ('Shaunak', 35, 'Mumbai', np.NaN ),
           ('Shaun', 35, 'Colombo', 11)
            ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

Contents of the dataframe empDfObj  are,

      Name   Age     City  Experience
a     jack  34.0   Sydney         5.0
b     Riti  31.0    Delhi         7.0
c     Aadi  16.0      NaN        11.0
d    Mohit   NaN    Delhi        15.0
e    Veena  33.0    Delhi         4.0
f  Shaunak  35.0   Mumbai         NaN
g    Shaun  35.0  Colombo        11.0

Now let’s discuss different ways to count rows in this dataframe.

Count all rows in a Pandas Dataframe using Dataframe.shape

Dataframe.shape

Each Dataframe object has a member variable shape i.e. a tuple that contains dimensions of a dataframe like,

(Number_of_index, Number_of_columns)

First element of the tuple returned by Dataframe.shape contains the number of items in index in a dataframe i.e. basically the number of rows in the dataframe. Let’s use this to count number of rows in above created dataframe i.e.

# First index of tuple returned by shape contains the number of index/row in dataframe
numOfRows = empDfObj.shape[0]

print('Number of Rows in dataframe : ' , numOfRows)

Output:

Number of Rows in dataframe :  7

Count all rows in a Pandas Dataframe using Dataframe.index

Dataframe.index

Each Dataframe object has a member variable index that contains a sequence of index or row labels. We can calculate the length of that sequence to find out the number of rows in the dataframe i.e.

# Get row count of dataframe by finding the length of index labels
numOfRows = len(empDfObj.index)

print('Number of Rows in dataframe : ' , numOfRows)

Output:

Number of Rows in dataframe :  7

Count rows in a Pandas Dataframe that satisfies a condition using Dataframe.apply()

Using Dataframe.apply() we can apply a function to all the rows of a dataframe to find out if elements of rows satisfies a condition or not.
Based on the result it returns a bool series. By counting the number of True in the returned series we can find out the number of rows in dataframe that satisfies the condition.
Let’s see some examples,
Example 1:

Count the number of rows in a dataframe for which ‘Age’ column contains value more than 30 i.e.

# Get a bool series representing which row satisfies the condition i.e. True for
# row in which value of 'Age' column is more than 30
seriesObj = empDfObj.apply(lambda x: True if x['Age'] > 30 else False , axis=1)

# Count number of True in series
numOfRows = len(seriesObj[seriesObj == True].index)

print('Number of Rows in dataframe in which Age > 30 : ', numOfRows)

Output:

Number of Rows in dataframe in which Age > 30 :  5

Example 2:

Count the number of rows in a dataframe which contains 11 in any column i.e.

# Count number of rows in a dataframe that contains value 11 in any column
seriesObj = empDfObj.apply(lambda x: True if 11 in list(x) else False, axis=1)
numOfRows = len(seriesObj[seriesObj == True].index)

print('Number of Rows in dataframe which contain 11 in any column : ', numOfRows)

Output:

Number of Rows in dataframe which contain 11 in any column :  2

Example 3:

Count the number of rows in a dataframe which contains NaN in any column i.e.

# Count number of rows in a dataframe that contains NaN any column
seriesObj = empDfObj.apply(lambda x: x.isnull().any(), axis=1)
numOfRows = len(seriesObj[seriesObj == True].index)

print('Number of Rows in dataframe which contain NaN in any column : ', numOfRows)

Output:

Number of Rows in dataframe which contain NaN in any column :  3

Complete example is as follows

import pandas as pd
import numpy as np

def main():

    print('Create a Dataframe')
    # List of Tuples
    empoyees = [('jack', 34, 'Sydney', 5) ,
               ('Riti', 31, 'Delhi' , 7) ,
               ('Aadi', 16, np.NaN, 11) ,
               ('Mohit', np.NaN,'Delhi' , 15) ,
               ('Veena', 33, 'Delhi' , 4) ,
               ('Shaunak', 35, 'Mumbai', np.NaN ),
               ('Shaun', 35, 'Colombo', 11)
                ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
    print("Contents of the Dataframe : ")
    print(empDfObj)

    print('**** Get the row count of a Dataframe using Dataframe.shape')

    # First index of tuple returned by shape contains the number of index/row in dataframe
    numOfRows = empDfObj.shape[0]

    print('Number of Rows in dataframe : ' , numOfRows)

    print('**** Get the row count of a Dataframe using Dataframe.index')

    # Get row count of dataframe by finding the length of index labels
    numOfRows = len(empDfObj.index)

    print('Number of Rows in dataframe : ' , numOfRows)

    print('**** Count Number of Rows in dataframe that satisfy a condition ****')

    # Get a bool series representing which row satisfies the condition i.e. True for
    # row in which value of 'Age' column is more than 30
    seriesObj = empDfObj.apply(lambda x: True if x['Age'] > 30 else False , axis=1)
    # Count number of True in series
    numOfRows = len(seriesObj[seriesObj == True].index)
    print('Number of Rows in dataframe in which Age > 30 : ', numOfRows)

    print('**** Count Number of Rows in dataframe that contains a value ****')

    # Count number of rows in a dataframe that contains value 11 in any column
    seriesObj = empDfObj.apply(lambda x: True if 11 in list(x) else False, axis=1)
    numOfRows = len(seriesObj[seriesObj == True].index)

    print('Number of Rows in dataframe which contain 11 in any column : ', numOfRows)

    print('**** Count Number of Rows in dataframe that contains NaN ****')

    # Count number of rows in a dataframe that contains NaN any column
    seriesObj = empDfObj.apply(lambda x: x.isnull().any(), axis=1)
    numOfRows = len(seriesObj[seriesObj == True].index)

    print('Number of Rows in dataframe which contain NaN in any column : ', numOfRows)


if __name__ == '__main__':
  main()

Output

Create a Dataframe
Contents of the Dataframe : 
      Name   Age     City  Experience
a     jack  34.0   Sydney         5.0
b     Riti  31.0    Delhi         7.0
c     Aadi  16.0      NaN        11.0
d    Mohit   NaN    Delhi        15.0
e    Veena  33.0    Delhi         4.0
f  Shaunak  35.0   Mumbai         NaN
g    Shaun  35.0  Colombo        11.0
**** Get the row count of a Dataframe using Dataframe.shape
Number of Rows in dataframe :  7
**** Get the row count of a Dataframe using Dataframe.index
Number of Rows in dataframe :  7
**** Count Number of Rows in dataframe that satisfy a condition ****
Number of Rows in dataframe in which Age > 30 :  5
**** Count Number of Rows in dataframe that contains a value ****
Number of Rows in dataframe which contain 11 in any column :  2
**** Count Number of Rows in dataframe that contains NaN ****
Number of Rows in dataframe which contain NaN in any column :  3

 

2 thoughts on “Pandas : count rows in a dataframe | all or those only that satisfy a condition”

  1. hi, thanks, good examples!

    In example 1: “Count the number of rows in a dataframe for which ‘Age’ column contains value more than 30 i.e.” Is there a way to get the cumulative count for each row?

    I have a similar problem where i want to caluclate all “A” in column “Result”. But i wan to know the count for each row, something like this:

    Result A_count
    C 0
    B 0
    A 1
    B 1
    A 2

    and so on…

    Thanks

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top