How to sort within groups using Pandas GroupBy?

In this article, we will discuss how to sort within groups using the pandas GroupBy. We are going to use the sort_values function to sort the Series within the groups.

Table of Contents

Preparing DataSet
Sorting the data before applying pandas GroupBy
Using pandas.DataFrame.nlargest function
Using the apply() function
The complete example
Summary

Preparing DataSet

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd

# List of Tuples
employees = [('Shubham', 'India', 'Tech',   5),
            ('Riti', 'India', 'Design' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'PMO', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])
print(df)

Contents of the created dataframe are,

      Name Location    Team  Experience
0  Shubham    India    Tech           5
1     Riti    India  Design           7
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
4     Aadi       US     PMO          11
5      Sim       US    Tech           4

Sorting the data before applying pandas GroupBy

The easier way to sort within groups is to first sort the data and then use GroupBy on top of it. The GroupBy function maintains the order of the DataFrame in its output. Let’s understand with an example, say, we need to sort the DataFrame in terms of Experience (in descending order) by Team.

# first sort and then groupby
modDf = df.sort_values(["Team","Experience"], ascending=False).groupby("Team").head(10)

print(modDf)

Output

      Name Location    Team  Experience
0  Shubham    India    Tech           5
5      Sim       US    Tech           4
4     Aadi       US     PMO          11
2   Shanky    India     PMO           2
1     Riti    India  Design           7
3   Shreya    India  Design           2

As observed, we first sorted the columns “Team” and “Experience” and then used the groupby function. Now the Experience column is sorted in descending order within each Team group.

Frequently Asked:

Using pandas.DataFrame.nlargest function

Another way is to use pandas.DataFrame.nlargest() method which is used to get the first n rows ordered by columns in descending order.

# using nlargest function
modDf = df.groupby(["Team"])['Experience'].nlargest(3)

print(modDf)

Output

Team     
Design  1     7
        3     2
PMO     4    11
        2     2
Tech    0     5
        5     4
Name: Experience, dtype: int64

Here, we have got the row indexes ordered within groups, which can be traced back to the original DataFrame.

Using the apply() function

We can also use the sort_values() function within the apply method to sort the values within groups. Let’s understand with the same example as above.

# using apply function
modDf = df.groupby(["Team"]).apply(lambda x: x.sort_values(['Experience'], ascending=False))

print(modDf)

Output

             Name Location    Team  Experience
Team                                          
Design 1     Riti    India  Design           7
       3   Shreya    India  Design           2
PMO    4     Aadi       US     PMO          11
       2   Shanky    India     PMO           2
Tech   0  Shubham    India    Tech           5
       5      Sim       US    Tech           4

The complete example

import pandas as pd

# List of Tuples
employees = [('Shubham', 'India', 'Tech',   5),
            ('Riti', 'India', 'Design' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'PMO', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])

# first sort and then groupby
modDf = df.sort_values(["Team","Experience"], ascending=False).groupby("Team").head(10)

print(modDf)

# using nlargest function
modDf = df.groupby(["Team"])['Experience'].nlargest(3)

print(modDf)


# using apply function
modDf = df.groupby(["Team"]).apply(lambda x: x.sort_values(['Experience'], ascending=False))

print(modDf)

Output :

      Name Location    Team  Experience
0  Shubham    India    Tech           5
5      Sim       US    Tech           4
4     Aadi       US     PMO          11
2   Shanky    India     PMO           2
1     Riti    India  Design           7
3   Shreya    India  Design           2

Team     
Design  1     7
        3     2
PMO     4    11
        2     2
Tech    0     5
        5     4
Name: Experience, dtype: int64

             Name Location    Team  Experience
Team                                          
Design 1     Riti    India  Design           7
       3   Shreya    India  Design           2
PMO    4     Aadi       US     PMO          11
       2   Shanky    India     PMO           2
Tech   0  Shubham    India    Tech           5
       5      Sim       US    Tech           4

Summary

In this article, we have discussed how to sort values within groups using pandas GroupBy.

Preparing DataSet

Sorting the data before applying pandas GroupBy

Frequently Asked:

Using pandas.DataFrame.nlargest function

Using the apply() function

The complete example

Summary

Related posts:

Share your love

Leave a Comment Cancel Reply