In this article, we will discuss how to sort within groups using the pandas GroupBy. We are going to use the sort_values function to sort the Series within the groups.
Table of Contents
Preparing DataSet
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd # List of Tuples employees = [('Shubham', 'India', 'Tech', 5), ('Riti', 'India', 'Design' , 7), ('Shanky', 'India', 'PMO' , 2), ('Shreya', 'India', 'Design' , 2), ('Aadi', 'US', 'PMO', 11), ('Sim', 'US', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Location', 'Team', 'Experience']) print(df)
Contents of the created dataframe are,
Name Location Team Experience 0 Shubham India Tech 5 1 Riti India Design 7 2 Shanky India PMO 2 3 Shreya India Design 2 4 Aadi US PMO 11 5 Sim US Tech 4
Sorting the data before applying pandas GroupBy
The easier way to sort within groups is to first sort the data and then use GroupBy on top of it. The GroupBy function maintains the order of the DataFrame in its output. Let’s understand with an example, say, we need to sort the DataFrame in terms of Experience (in descending order) by Team.
Frequently Asked:
# first sort and then groupby modDf = df.sort_values(["Team","Experience"], ascending=False).groupby("Team").head(10) print(modDf)
Output
Name Location Team Experience 0 Shubham India Tech 5 5 Sim US Tech 4 4 Aadi US PMO 11 2 Shanky India PMO 2 1 Riti India Design 7 3 Shreya India Design 2
As observed, we first sorted the columns “Team” and “Experience” and then used the groupby function. Now the Experience column is sorted in descending order within each Team group.
Using pandas.DataFrame.nlargest function
Another way is to use pandas.DataFrame.nlargest() method which is used to get the first n rows ordered by columns in descending order.
# using nlargest function modDf = df.groupby(["Team"])['Experience'].nlargest(3) print(modDf)
Output
Latest Python - Video Tutorial
Team Design 1 7 3 2 PMO 4 11 2 2 Tech 0 5 5 4 Name: Experience, dtype: int64
Here, we have got the row indexes ordered within groups, which can be traced back to the original DataFrame.
Using the apply() function
We can also use the sort_values() function within the apply method to sort the values within groups. Let’s understand with the same example as above.
# using apply function modDf = df.groupby(["Team"]).apply(lambda x: x.sort_values(['Experience'], ascending=False)) print(modDf)
Output
Name Location Team Experience Team Design 1 Riti India Design 7 3 Shreya India Design 2 PMO 4 Aadi US PMO 11 2 Shanky India PMO 2 Tech 0 Shubham India Tech 5 5 Sim US Tech 4
The complete example
import pandas as pd # List of Tuples employees = [('Shubham', 'India', 'Tech', 5), ('Riti', 'India', 'Design' , 7), ('Shanky', 'India', 'PMO' , 2), ('Shreya', 'India', 'Design' , 2), ('Aadi', 'US', 'PMO', 11), ('Sim', 'US', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Location', 'Team', 'Experience']) # first sort and then groupby modDf = df.sort_values(["Team","Experience"], ascending=False).groupby("Team").head(10) print(modDf) # using nlargest function modDf = df.groupby(["Team"])['Experience'].nlargest(3) print(modDf) # using apply function modDf = df.groupby(["Team"]).apply(lambda x: x.sort_values(['Experience'], ascending=False)) print(modDf)
Output :
Name Location Team Experience 0 Shubham India Tech 5 5 Sim US Tech 4 4 Aadi US PMO 11 2 Shanky India PMO 2 1 Riti India Design 7 3 Shreya India Design 2 Team Design 1 7 3 2 PMO 4 11 2 2 Tech 0 5 5 4 Name: Experience, dtype: int64 Name Location Team Experience Team Design 1 Riti India Design 7 3 Shreya India Design 2 PMO 4 Aadi US PMO 11 2 Shanky India PMO 2 Tech 0 Shubham India Tech 5 5 Sim US Tech 4
Summary
In this article, we have discussed how to sort values within groups using pandas GroupBy.
Latest Video Tutorials