In this article we will discuss different ways to count number of all rows in a Dataframe or rows that satisfy a condition.
Let’s create a Dataframe,
# List of Tuples empoyees = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', np.NaN,'Delhi' , 15) , ('Veena', 33, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', np.NaN ), ('Shaun', 35, 'Colombo', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
Contents of the dataframe empDfObj  are,
Name Age City Experience a jack 34.0 Sydney 5.0 b Riti 31.0 Delhi 7.0 c Aadi 16.0 NaN 11.0 d Mohit NaN Delhi 15.0 e Veena 33.0 Delhi 4.0 f Shaunak 35.0 Mumbai NaN g Shaun 35.0 Colombo 11.0
Now let’s discuss different ways to count rows in this dataframe.
Count all rows in a Pandas Dataframe using Dataframe.shape
Dataframe.shape
Each Dataframe object has a member variable shape i.e. a tuple that contains dimensions of a dataframe like,
(Number_of_index, Number_of_columns)
Frequently Asked:
- Pandas: Get last N rows of dataframe
- Pandas: Apply Function to Column
- Pretty Print a Pandas Dataframe
- How to delete first N columns of pandas dataframe
First element of the tuple returned by Dataframe.shape contains the number of items in index in a dataframe i.e. basically the number of rows in the dataframe. Let’s use this to count number of rows in above created dataframe i.e.
# First index of tuple returned by shape contains the number of index/row in dataframe numOfRows = empDfObj.shape[0] print('Number of Rows in dataframe : ' , numOfRows)
Output:
Number of Rows in dataframe : 7
Count all rows in a Pandas Dataframe using Dataframe.index
Dataframe.index
Each Dataframe object has a member variable index that contains a sequence of index or row labels. We can calculate the length of that sequence to find out the number of rows in the dataframe i.e.
# Get row count of dataframe by finding the length of index labels numOfRows = len(empDfObj.index) print('Number of Rows in dataframe : ' , numOfRows)
Output:
Number of Rows in dataframe : 7
Count rows in a Pandas Dataframe that satisfies a condition using Dataframe.apply()
Using Dataframe.apply() we can apply a function to all the rows of a dataframe to find out if elements of rows satisfies a condition or not.
Based on the result it returns a bool series. By counting the number of True in the returned series we can find out the number of rows in dataframe that satisfies the condition.
Let’s see some examples,
Example 1:
Count the number of rows in a dataframe for which ‘Age’ column contains value more than 30 i.e.
# Get a bool series representing which row satisfies the condition i.e. True for # row in which value of 'Age' column is more than 30 seriesObj = empDfObj.apply(lambda x: True if x['Age'] > 30 else False , axis=1) # Count number of True in series numOfRows = len(seriesObj[seriesObj == True].index) print('Number of Rows in dataframe in which Age > 30 : ', numOfRows)
Output:
Number of Rows in dataframe in which Age > 30 : 5
Example 2:
Count the number of rows in a dataframe which contains 11 in any column i.e.
# Count number of rows in a dataframe that contains value 11 in any column seriesObj = empDfObj.apply(lambda x: True if 11 in list(x) else False, axis=1) numOfRows = len(seriesObj[seriesObj == True].index) print('Number of Rows in dataframe which contain 11 in any column : ', numOfRows)
Output:
Number of Rows in dataframe which contain 11 in any column : 2
Example 3:
Count the number of rows in a dataframe which contains NaN in any column i.e.
# Count number of rows in a dataframe that contains NaN any column seriesObj = empDfObj.apply(lambda x: x.isnull().any(), axis=1) numOfRows = len(seriesObj[seriesObj == True].index) print('Number of Rows in dataframe which contain NaN in any column : ', numOfRows)
Output:
Number of Rows in dataframe which contain NaN in any column : 3
Complete example is as follows
import pandas as pd import numpy as np def main(): print('Create a Dataframe') # List of Tuples empoyees = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', np.NaN,'Delhi' , 15) , ('Veena', 33, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', np.NaN ), ('Shaun', 35, 'Colombo', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) print("Contents of the Dataframe : ") print(empDfObj) print('**** Get the row count of a Dataframe using Dataframe.shape') # First index of tuple returned by shape contains the number of index/row in dataframe numOfRows = empDfObj.shape[0] print('Number of Rows in dataframe : ' , numOfRows) print('**** Get the row count of a Dataframe using Dataframe.index') # Get row count of dataframe by finding the length of index labels numOfRows = len(empDfObj.index) print('Number of Rows in dataframe : ' , numOfRows) print('**** Count Number of Rows in dataframe that satisfy a condition ****') # Get a bool series representing which row satisfies the condition i.e. True for # row in which value of 'Age' column is more than 30 seriesObj = empDfObj.apply(lambda x: True if x['Age'] > 30 else False , axis=1) # Count number of True in series numOfRows = len(seriesObj[seriesObj == True].index) print('Number of Rows in dataframe in which Age > 30 : ', numOfRows) print('**** Count Number of Rows in dataframe that contains a value ****') # Count number of rows in a dataframe that contains value 11 in any column seriesObj = empDfObj.apply(lambda x: True if 11 in list(x) else False, axis=1) numOfRows = len(seriesObj[seriesObj == True].index) print('Number of Rows in dataframe which contain 11 in any column : ', numOfRows) print('**** Count Number of Rows in dataframe that contains NaN ****') # Count number of rows in a dataframe that contains NaN any column seriesObj = empDfObj.apply(lambda x: x.isnull().any(), axis=1) numOfRows = len(seriesObj[seriesObj == True].index) print('Number of Rows in dataframe which contain NaN in any column : ', numOfRows) if __name__ == '__main__': main()
Output
Create a Dataframe Contents of the Dataframe : Name Age City Experience a jack 34.0 Sydney 5.0 b Riti 31.0 Delhi 7.0 c Aadi 16.0 NaN 11.0 d Mohit NaN Delhi 15.0 e Veena 33.0 Delhi 4.0 f Shaunak 35.0 Mumbai NaN g Shaun 35.0 Colombo 11.0 **** Get the row count of a Dataframe using Dataframe.shape Number of Rows in dataframe : 7 **** Get the row count of a Dataframe using Dataframe.index Number of Rows in dataframe : 7 **** Count Number of Rows in dataframe that satisfy a condition **** Number of Rows in dataframe in which Age > 30 : 5 **** Count Number of Rows in dataframe that contains a value **** Number of Rows in dataframe which contain 11 in any column : 2 **** Count Number of Rows in dataframe that contains NaN **** Number of Rows in dataframe which contain NaN in any column : 3
hi, thanks, good examples!
In example 1: “Count the number of rows in a dataframe for which ‘Age’ column contains value more than 30 i.e.” Is there a way to get the cumulative count for each row?
I have a similar problem where i want to caluclate all “A” in column “Result”. But i wan to know the count for each row, something like this:
Result A_count
C 0
B 0
A 1
B 1
A 2
and so on…
Thanks
Very useful!