In this article, we will discuss how to convert a pandas GroupBy output from Series to DataFrame. This is a very common issue in data analysis or performing feature engineering. Let’s look at multiple ways to achieve that.
Table of Contents
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'Data Scientist', 'Mumbai', 5), ('Riti', 'Data Scientist', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Scientist', 'Mumbai', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df)
Contents of the created dataframe are,
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11
Now, let’s create a groupBy object for the experimentation. For example, let’s calculate the count by the columns “City” and “Designation”.
grouped = df.groupby(['City', 'Designation']).size() print (grouped)
Output
City Designation Delhi Data Scientist 1 Program Manager 1 Mumbai Data Scientist 2 Graphic Designer 1 dtype: int64
Also, we can see below that the grouped object is a pandas Series and we need to convert this to DataFrame.
Frequently Asked:
print(type(grouped))
Output
<class 'pandas.core.series.Series'>
Using reset_index method
This is the most common and easiest method to convert the grouped object from Series to DataFrame. Let’s quickly try to convert the above grouped object using this method.
grouped = df.groupby(['City', 'Designation']).size() # converting grouped object to DataFrame using reset_index groupedDf = grouped.reset_index() print (groupedDf)
Output
City Designation 0 0 Delhi Data Scientist 1 1 Delhi Program Manager 1 2 Mumbai Data Scientist 2 3 Mumbai Graphic Designer 1
As observed above, the Series is now converted to a pandas DataFrame (stored as groupedDf). We can validate the same by looking at the type of the groupedDf object again.
# type check print (type(groupedDf))
Output
<class 'pandas.core.frame.DataFrame'>
This confirms that the grouped output is converted into pandas DataFrame. However, the new column created has been automatically assigned “0” as the column name. We can control this by using the “name” attribute in the reset_index function as shown below.
grouped = df.groupby(['City', 'Designation']).size() # converting grouped object to DataFrame using reset_index with column name groupedDf = grouped.reset_index(name = "Count") print (groupedDf)
Output
City Designation Count 0 Delhi Data Scientist 1 1 Delhi Program Manager 1 2 Mumbai Data Scientist 2 3 Mumbai Graphic Designer 1
Here, we have the final DataFrame output with the assigned column name.
The Complete example is as follows,
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'Data Scientist', 'Mumbai', 5), ('Riti', 'Data Scientist', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Scientist', 'Mumbai', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) grouped = df.groupby(['City', 'Designation']).size() print(grouped) # converting grouped object to DataFrame using reset_index with column name groupedDf = grouped.reset_index(name = "Count") print (groupedDf) print (type(groupedDf))
Output:
Name Designation City Experience 0 Shubham Data Scientist Mumbai 5 1 Riti Data Scientist Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Scientist Mumbai 11 City Designation Delhi Data Scientist 1 Program Manager 1 Mumbai Data Scientist 2 Graphic Designer 1 dtype: int64 City Designation Count 0 Delhi Data Scientist 1 1 Delhi Program Manager 1 2 Mumbai Data Scientist 2 3 Mumbai Graphic Designer 1 <class 'pandas.core.frame.DataFrame'>
Using as_index attribute in GroupBy function
Another simple method is to set “as_index” attribute in the groupBy() function directly. We can set it to “False” in case we don’t want to use the groupBy columns as indexes and rather store it as a DataFrame.
# using the as_index attribute groupedDf = df.groupby(['City', 'Designation'], as_index=False).size() print (groupedDf)
Output
City Designation size 0 Delhi Data Scientist 1 1 Delhi Program Manager 1 2 Mumbai Data Scientist 2 3 Mumbai Graphic Designer 1
We have the grouped output directly as pandas DataFrame. Although, the column name for the new column is automatically assigned based on the operation used, which can be further changed.
The complete example is as follows,
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'Data Scientist', 'Mumbai', 5), ('Riti', 'Data Scientist', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Scientist', 'Mumbai', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # using the as_index attribute groupedDf = df.groupby(['City', 'Designation'], as_index=False).size() print (groupedDf) print (type(groupedDf))
Output:
Name Designation City Experience 0 Shubham Data Scientist Mumbai 5 1 Riti Data Scientist Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Scientist Mumbai 11 City Designation size 0 Delhi Data Scientist 1 1 Delhi Program Manager 1 2 Mumbai Data Scientist 2 3 Mumbai Graphic Designer 1 <class 'pandas.core.frame.DataFrame'>
Using to_frame method
The “to_frame()” method is also frequently used to convert any pandas Series to a DataFrame object. Let’s quickly try it on our grouped object.
grouped = df.groupby(['City', 'Designation']).size() # using the to_frame method groupedDf = grouped.to_frame() print (groupedDf) print (type(groupedDf))
Output
0 City Designation Delhi Data Scientist 1 Program Manager 1 Mumbai Data Scientist 2 Graphic Designer 1 <class 'pandas.core.frame.DataFrame'>
We can also set the column name here using the “name” attribute in the to_frame function as shown below.
grouped = df.groupby(['City', 'Designation']).size() # using the to_frame method groupedDf = grouped.to_frame(name = 'Count') print (groupedDf) print (type(groupedDf))
Output
Count City Designation Delhi Data Scientist 1 Program Manager 1 Mumbai Data Scientist 2 Graphic Designer 1 <class 'pandas.core.frame.DataFrame'>
Note that the “City” and “Designation” columns are still present as indexes here. Therefore, in case we want to use these columns further, we will have to use “reset_index” again here.
The complete example is as follows,
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'Data Scientist', 'Mumbai', 5), ('Riti', 'Data Scientist', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Scientist', 'Mumbai', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) grouped = df.groupby(['City', 'Designation']).size() # using the to_frame method groupedDf = grouped.to_frame(name = 'Count') print (groupedDf) print (type(groupedDf))
Output:
Name Designation City Experience 0 Shubham Data Scientist Mumbai 5 1 Riti Data Scientist Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Scientist Mumbai 11 Count City Designation Delhi Data Scientist 1 Program Manager 1 Mumbai Data Scientist 2 Graphic Designer 1 <class 'pandas.core.frame.DataFrame'>
Summary
Great, you made it! In this article, we have discussed multiple ways to convert a Pandas GroupBy output from Series to DataFrame