How to sort within groups using Pandas GroupBy?

In this article, we will discuss how to sort within groups using the pandas GroupBy. We are going to use the sort_values function to sort the Series within the groups.

Table of Contents

Preparing DataSet

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd

# List of Tuples
employees = [('Shubham', 'India', 'Tech',   5),
            ('Riti', 'India', 'Design' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'PMO', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])
print(df)

Contents of the created dataframe are,

      Name Location    Team  Experience
0  Shubham    India    Tech           5
1     Riti    India  Design           7
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
4     Aadi       US     PMO          11
5      Sim       US    Tech           4

Sorting the data before applying pandas GroupBy

The easier way to sort within groups is to first sort the data and then use GroupBy on top of it. The GroupBy function maintains the order of the DataFrame in its output. Let’s understand with an example, say, we need to sort the DataFrame in terms of Experience (in descending order) by Team.

Advertisements
# first sort and then groupby
modDf = df.sort_values(["Team","Experience"], ascending=False).groupby("Team").head(10)

print(modDf)

Output

      Name Location    Team  Experience
0  Shubham    India    Tech           5
5      Sim       US    Tech           4
4     Aadi       US     PMO          11
2   Shanky    India     PMO           2
1     Riti    India  Design           7
3   Shreya    India  Design           2

As observed, we first sorted the columns “Team” and “Experience” and then used the groupby function. Now the Experience column is sorted in descending order within each Team group.

Using pandas.DataFrame.nlargest function

Another way is to use pandas.DataFrame.nlargest() method which is used to get the first n rows ordered by columns in descending order.

# using nlargest function
modDf = df.groupby(["Team"])['Experience'].nlargest(3)

print(modDf)

Output

Team     
Design  1     7
        3     2
PMO     4    11
        2     2
Tech    0     5
        5     4
Name: Experience, dtype: int64

Here, we have got the row indexes ordered within groups, which can be traced back to the original DataFrame.

Using the apply() function

We can also use the sort_values() function within the apply method to sort the values within groups. Let’s understand with the same example as above.

# using apply function
modDf = df.groupby(["Team"]).apply(lambda x: x.sort_values(['Experience'], ascending=False))

print(modDf)

Output

             Name Location    Team  Experience
Team                                          
Design 1     Riti    India  Design           7
       3   Shreya    India  Design           2
PMO    4     Aadi       US     PMO          11
       2   Shanky    India     PMO           2
Tech   0  Shubham    India    Tech           5
       5      Sim       US    Tech           4

The complete example

import pandas as pd

# List of Tuples
employees = [('Shubham', 'India', 'Tech',   5),
            ('Riti', 'India', 'Design' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'PMO', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])

# first sort and then groupby
modDf = df.sort_values(["Team","Experience"], ascending=False).groupby("Team").head(10)

print(modDf)

# using nlargest function
modDf = df.groupby(["Team"])['Experience'].nlargest(3)

print(modDf)


# using apply function
modDf = df.groupby(["Team"]).apply(lambda x: x.sort_values(['Experience'], ascending=False))

print(modDf)

Output :

      Name Location    Team  Experience
0  Shubham    India    Tech           5
5      Sim       US    Tech           4
4     Aadi       US     PMO          11
2   Shanky    India     PMO           2
1     Riti    India  Design           7
3   Shreya    India  Design           2

Team     
Design  1     7
        3     2
PMO     4    11
        2     2
Tech    0     5
        5     4
Name: Experience, dtype: int64

             Name Location    Team  Experience
Team                                          
Design 1     Riti    India  Design           7
       3   Shreya    India  Design           2
PMO    4     Aadi       US     PMO          11
       2   Shanky    India     PMO           2
Tech   0  Shubham    India    Tech           5
       5      Sim       US    Tech           4

Summary

In this article, we have discussed how to sort values within groups using pandas GroupBy.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top