Convert Pandas GroupBy output to DataFrame

In this article, we will discuss how to convert a pandas GroupBy output from Series to DataFrame. This is a very common issue in data analysis or performing feature engineering. Let’s look at multiple ways to achieve that.

Table of Contents

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Mumbai',   5),
            ('Riti', 'Data Scientist', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Scientist', 'Mumbai', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

Contents of the created dataframe are,

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

Now, let’s create a groupBy object for the experimentation. For example, let’s calculate the count by the columns “City” and “Designation”.

Advertisements
grouped = df.groupby(['City', 'Designation']).size()
print (grouped)

Output

City    Designation     
Delhi   Data Scientist      1
        Program Manager     1
Mumbai  Data Scientist      2
        Graphic Designer    1
dtype: int64

Also, we can see below that the grouped object is a pandas Series and we need to convert this to DataFrame.

print(type(grouped))

Output

<class 'pandas.core.series.Series'>

Using reset_index method

This is the most common and easiest method to convert the grouped object from Series to DataFrame. Let’s quickly try to convert the above grouped object using this method.

grouped = df.groupby(['City', 'Designation']).size()

# converting grouped object to DataFrame using reset_index
groupedDf = grouped.reset_index()

print (groupedDf)

Output

     City       Designation  0
0   Delhi    Data Scientist  1
1   Delhi   Program Manager  1
2  Mumbai    Data Scientist  2
3  Mumbai  Graphic Designer  1

As observed above, the Series is now converted to a pandas DataFrame (stored as groupedDf). We can validate the same by looking at the type of the groupedDf object again.

# type check
print (type(groupedDf))

Output

<class 'pandas.core.frame.DataFrame'>

This confirms that the grouped output is converted into pandas DataFrame. However, the new column created has been automatically assigned “0” as the column name. We can control this by using the “name” attribute in the reset_index function as shown below.

grouped = df.groupby(['City', 'Designation']).size()

# converting grouped object to DataFrame using reset_index with column name
groupedDf = grouped.reset_index(name = "Count")

print (groupedDf)

Output

     City       Designation  Count
0   Delhi    Data Scientist      1
1   Delhi   Program Manager      1
2  Mumbai    Data Scientist      2
3  Mumbai  Graphic Designer      1

Here, we have the final DataFrame output with the assigned column name.

The Complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Mumbai',   5),
            ('Riti', 'Data Scientist', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Scientist', 'Mumbai', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

grouped = df.groupby(['City', 'Designation']).size()

print(grouped)

# converting grouped object to DataFrame using reset_index with column name
groupedDf = grouped.reset_index(name = "Count")

print (groupedDf)

print (type(groupedDf))

Output:

      Name       Designation    City  Experience
0  Shubham    Data Scientist  Mumbai           5
1     Riti    Data Scientist   Delhi           7
2   Shanky   Program Manager   Delhi           2
3   Shreya  Graphic Designer  Mumbai           2
4     Aadi    Data Scientist  Mumbai          11

City    Designation     
Delhi   Data Scientist      1
        Program Manager     1
Mumbai  Data Scientist      2
        Graphic Designer    1
dtype: int64

     City       Designation  Count
0   Delhi    Data Scientist      1
1   Delhi   Program Manager      1
2  Mumbai    Data Scientist      2
3  Mumbai  Graphic Designer      1

<class 'pandas.core.frame.DataFrame'>

Using as_index attribute in GroupBy function

Another simple method is to set “as_index” attribute in the groupBy() function directly. We can set it to “False” in case we don’t want to use the groupBy columns as indexes and rather store it as a DataFrame.

# using the as_index attribute
groupedDf = df.groupby(['City', 'Designation'], as_index=False).size()

print (groupedDf)

Output

     City       Designation  size
0   Delhi    Data Scientist     1
1   Delhi   Program Manager     1
2  Mumbai    Data Scientist     2
3  Mumbai  Graphic Designer     1

We have the grouped output directly as pandas DataFrame. Although, the column name for the new column is automatically assigned based on the operation used, which can be further changed.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Mumbai',   5),
            ('Riti', 'Data Scientist', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Scientist', 'Mumbai', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# using the as_index attribute
groupedDf = df.groupby(['City', 'Designation'], as_index=False).size()

print (groupedDf)

print (type(groupedDf))

Output:

      Name       Designation    City  Experience
0  Shubham    Data Scientist  Mumbai           5
1     Riti    Data Scientist   Delhi           7
2   Shanky   Program Manager   Delhi           2
3   Shreya  Graphic Designer  Mumbai           2
4     Aadi    Data Scientist  Mumbai          11

     City       Designation  size
0   Delhi    Data Scientist     1
1   Delhi   Program Manager     1
2  Mumbai    Data Scientist     2
3  Mumbai  Graphic Designer     1

<class 'pandas.core.frame.DataFrame'>

Using to_frame method

The “to_frame()” method is also frequently used to convert any pandas Series to a DataFrame object. Let’s quickly try it on our grouped object.

grouped = df.groupby(['City', 'Designation']).size()

# using the to_frame method
groupedDf = grouped.to_frame()

print (groupedDf)

print (type(groupedDf))

Output

                         0
City   Designation        
Delhi  Data Scientist    1
       Program Manager   1
Mumbai Data Scientist    2
       Graphic Designer  1

<class 'pandas.core.frame.DataFrame'>

We can also set the column name here using the “name” attribute in the to_frame function as shown below.

grouped = df.groupby(['City', 'Designation']).size()

# using the to_frame method
groupedDf = grouped.to_frame(name = 'Count')

print (groupedDf)

print (type(groupedDf))

Output

                         Count
City   Designation            
Delhi  Data Scientist        1
       Program Manager       1
Mumbai Data Scientist        2
       Graphic Designer      1
<class 'pandas.core.frame.DataFrame'>

Note that the “City” and “Designation” columns are still present as indexes here. Therefore, in case we want to use these columns further, we will have to use “reset_index” again here.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Mumbai',   5),
            ('Riti', 'Data Scientist', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Scientist', 'Mumbai', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

grouped = df.groupby(['City', 'Designation']).size()

# using the to_frame method
groupedDf = grouped.to_frame(name = 'Count')

print (groupedDf)

print (type(groupedDf))

Output:

      Name       Designation    City  Experience
0  Shubham    Data Scientist  Mumbai           5
1     Riti    Data Scientist   Delhi           7
2   Shanky   Program Manager   Delhi           2
3   Shreya  Graphic Designer  Mumbai           2
4     Aadi    Data Scientist  Mumbai          11

                         Count
City   Designation            
Delhi  Data Scientist        1
       Program Manager       1
Mumbai Data Scientist        2
       Graphic Designer      1

<class 'pandas.core.frame.DataFrame'>

Summary

Great, you made it! In this article, we have discussed multiple ways to convert a Pandas GroupBy output from Series to DataFrame

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top