Pandas – Count values in a Column

In this article, we will discuss different ways to count values in a Dataframe column. First, we will count only non-NaN values in a column, and then we will explore ways to count all values in a column, including NaN values.

Let’s first create a Dataframe from a list of tuples i.e.

import pandas as pd
import numpy as np

# List of Tuples
list_of_tuples = [  (11,  34,     78,  5,  11, 56),
                    (12,  np.NaN, 98,  7,  12, np.NaN),
                    (13,  14,     11,  11, 56, 41) ,
                    (13,  41,     11,  11, 56, 41) ,
                    (13,  34,     11,  11, 56, 41) ,
                    (12,  41,     12,  41, 78, np.NaN)]


# Create a DataFrame object
df = pd.DataFrame(  list_of_tuples, 
                    columns=['A', 'B', 'C', 'D', 'E', 'F'])

print(df)

Output

    A     B   C   D   E     F
0  11  34.0  78   5  11  56.0
1  12   NaN  98   7  12   NaN
2  13  14.0  11  11  56  41.0
3  13  41.0  11  11  56  41.0
4  13  34.0  11  11  56  41.0
5  12  41.0  12  41  78   NaN

This Dataframe contains six columns, and in each column, there are some integers or NaN values. Now we will explore different ways to count values in a specific column.

Count non-NaN values in a Dataframe Column using Series.count()

By using the subscript operator on Dataframe, we can select any specific column as a Series object. Then we can call the count() function on that Series object. It will give us the count of non-NaN values in that column. For example,

Advertisements
# Count non NaN values in column A
count = df['A'].count()

print(' Count of non NaN values in column A : ', count)

Output

 Count of non NaN values in column A :  6

As column ‘A’ of the dataframe had no NaN values, therefore it returned 6. But column ‘B’ has a NaN value. So, let’s check the count of non-NaN values in column ‘B’,

# Count non NaN values in column B
count = df['B'].count()

print(' Count of non NaN values in column B : ', count)

Output

 Count of non NaN values in column B :  5

Now let’s see another way to do the same thing.

Count non-NaN values in a Dataframe Column using Series.value_count()

Select a column from the dataframe using the column name. It will be a Series object. Then call the value_count() on the Series/column. It will return a series containing the occurrence count of each element from the calling series object, i.e., the selected column in our case. For example,

# Occurrence count of all non values in column B
occurrence_count = df['B'].value_counts()

print(occurrence_count)

Output

41.0    2
34.0    2
14.0    1
Name: B, dtype: int64

It returned the occurrence count of each distinct value in column ‘B’ except NaN. We can call the sum() on this Series object to get the count of all non-NaN values from the column ‘B’ i.e.

# Total Count of non values in column B
count = df['B'].value_counts().sum()

print(' Count of non NaN values in column B : ', count)

Output

Count of non NaN values in column B :  5

So, this way, we can get the count of non-NaN values in a Dataframe column.

The complete example till now is as follows,

import pandas as pd
import numpy as np

# List of Tuples
list_of_tuples = [  (11,  34,     78,  5,  11, 56),
                    (12,  np.NaN, 98,  7,  12, np.NaN),
                    (13,  14,     11,  11, 56, 41) ,
                    (13,  41,     11,  11, 56, 41) ,
                    (13,  34,     11,  11, 56, 41) ,
                    (12,  41,     12,  41, 78, np.NaN)]


# Create a DataFrame object
df = pd.DataFrame(  list_of_tuples, 
                    columns=['A', 'B', 'C', 'D', 'E', 'F'])

print(df)

# Count non NaN values in column A
count = df['A'].count()

print(' Count of non NaN values in column A : ', count)


# Count non NaN values in column B
count = df['B'].count()

print(' Count of non NaN values in column B : ', count)

# Occurrence count of all non values in column B
occurrence_count = df['B'].value_counts()

print(occurrence_count)

# Total Count of non values in column B
count = df['B'].value_counts().sum()

print(' Count of non NaN values in column B : ', count)

Output

    A     B   C   D   E     F
0  11  34.0  78   5  11  56.0
1  12   NaN  98   7  12   NaN
2  13  14.0  11  11  56  41.0
3  13  41.0  11  11  56  41.0
4  13  34.0  11  11  56  41.0
5  12  41.0  12  41  78   NaN

Count of non NaN values in column A :  6
Count of non NaN values in column B :  5

41.0    2
34.0    2
14.0    1
Name: B, dtype: int64

Count of non NaN values in column B :  5

Count all values in a Dataframe Column

What if we want to count all values in a selected column instead of only non-NaN values. In that case, we can choose the column of Dataframe using its name, and then we can use the following techniques,

Technique 1: Call len() function on selected column / series object,

# Count of total values in column B
count = len(df['B'])

print(' Count of total values in column B : ', count)

Output

Count of total values in column B :  6

It will give us the count of values in column ‘B’, including NaN values.

Technique 2: Use shape attribute of the selected column/series object,

# Count of total values in column B
count = df['B'].shape[0]

print(' Count of total values in column B : ', count)

Output

Count of total values in column B :  6

It returns the count of values in column ‘B’, including NaN values.

Technique 3: Use value_count(dropna=False) with of the selected column / series object

# Count of total values in column B
count = df['B'].value_counts(dropna=False).sum()

print(' Count of total values in column B : ', count)

Output

Count of total values in column B :  6

Series.value_count(dropa=False) returned the occurrence count of each distinct value in column ‘B’ including NaN. Then we called the sum() on this Series object to get the count of all values from the column ‘B’ i.e.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
list_of_tuples = [  (11,  34,     78,  5,  11, 56),
                    (12,  np.NaN, 98,  7,  12, np.NaN),
                    (13,  14,     11,  11, 56, 41) ,
                    (13,  41,     11,  11, 56, 41) ,
                    (13,  34,     11,  11, 56, 41) ,
                    (12,  41,     12,  41, 78, np.NaN)]


# Create a DataFrame object
df = pd.DataFrame(  list_of_tuples, 
                    columns=['A', 'B', 'C', 'D', 'E', 'F'])

print(df)

# Count of total values in column B
count = len(df['B'])

print(' Count of total values in column B : ', count)


# Count of total values in column B
count = df['B'].shape[0]

print(' Count of total values in column B : ', count)

# Count of total values in column B
count = df['B'].value_counts(dropna=False).sum()

print(' Count of total values in column B : ', count)

Output

    A     B   C   D   E     F
0  11  34.0  78   5  11  56.0
1  12   NaN  98   7  12   NaN
2  13  14.0  11  11  56  41.0
3  13  41.0  11  11  56  41.0
4  13  34.0  11  11  56  41.0
5  12  41.0  12  41  78   NaN

Count of total values in column B :  6
Count of total values in column B :  6
Count of total values in column B :  6

Summary:

We learned the different ways to get the count of values in a Dataframe column in Pandas.

Advertisements

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top