How to sort a Pandas DataFrame by one column?

In this article, we will discuss how to use the sort_values() function to sort a pandas DataFrame by one column. Although, the sort_values() is very simple to use, but we will try to cover multiple situation.

Table of Contents

Preparing dataset for solution

To quickly get started, let’s create a sample DataFrame to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

Contents of the created dataframe are,

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

Sorting DataFrame based on a single column

As mentioned above, we are going to use the “sort_values()” function. In this, we just need to pass a single argument “by”, containing the column name. Using this column name, it will sort the DataFrame. Let’s take an example below, where we need to sort the above DataFrame on the “Experience” column.

Advertisements
# sort values by Experience column
print(df.sort_values(by = 'Experience'))

Output

      Name        Designation    Team  Experience
2   Shanky    Program Manager     NaN           2
3   Shreya   Graphic Designer  Design           2
5      Sim      Data Engineer     NaN           4
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
4     Aadi  Backend Developer    Tech          11

This sorts our DataFrame using the column values from “Experience”, showing the employees with the lowest experience first. Note that, it does not modify the original DataFrame but only creates a view.

Let’s see the complete example,

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

# sort values by Experience column
print(df.sort_values(by = 'Experience'))

Output

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

      Name        Designation    Team  Experience
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
5      Sim      Data Engineer    Tech           4
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
4     Aadi  Backend Developer    Tech          11

Sorting DataFrame in Descending Order based on a single column

By default, the sort_values() function sorts the data in ascending order. In case we want to change it, we can use the “ascending” argument as shown below.

# sort values by Experience column
print(df.sort_values(by = 'Experience', ascending=False))

Output

      Name        Designation    Team  Experience
4     Aadi  Backend Developer    Tech          11
1     Riti      Data Engineer    Tech           7
0  Shubham     Data Scientist    Tech           5
5      Sim      Data Engineer     NaN           4
2   Shanky    Program Manager     NaN           2
3   Shreya   Graphic Designer  Design           2

As observed, the DataFrame is now sorted on the “Experience” column in descending order.

Let’s see the complete example,

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

# sort values by Experience column
print(df.sort_values(by = 'Experience', ascending=False))

Output

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

      Name        Designation    Team  Experience
4     Aadi  Backend Developer    Tech          11
1     Riti      Data Engineer    Tech           7
0  Shubham     Data Scientist    Tech           5
5      Sim      Data Engineer    Tech           4
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2

Sorting DataFrame and Handling missing values

Sorting columns containing missing values can be challenging. The sort_values function automatically detects the missing values and sorts them at the bottom of the DataFrame. For better understanding, let’s quickly try sorting our DataFrame on the “Team” column which contains some missing values.

# sort Dataframe on Team column
print (df.sort_values(by = 'Team'))

Let’s see the complete example,

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', np.NaN ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', np.NaN, 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

# sort Dataframe on Team column
print (df.sort_values(by = 'Team'))

Output

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer     NaN           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer     NaN           4

      Name        Designation    Team  Experience
3   Shreya   Graphic Designer  Design           2
2   Shanky    Program Manager     PMO           2
0  Shubham     Data Scientist    Tech           5
4     Aadi  Backend Developer    Tech          11
1     Riti      Data Engineer     NaN           7
5      Sim      Data Engineer     NaN           4

As expected, it sorted all the values from the “Team” column in ascending order but kept all the missing values at the bottom of the DataFrame. We can bring these null values at the top using the “na_position” argument as shown below.

# sort Dataframe on Team column
print (df.sort_values(by = 'Team', na_position='first'))

Output

      Name        Designation    Team  Experience
1     Riti      Data Engineer     NaN           7
5      Sim      Data Engineer     NaN           4
3   Shreya   Graphic Designer  Design           2
2   Shanky    Program Manager     PMO           2
0  Shubham     Data Scientist    Tech           5
4     Aadi  Backend Developer    Tech          11

The column is still sorted ascending on the “Team” column but now the missing values are at the top of the DataFrame. However, it is recommended to either impute or drop these missing values before sorting the DataFrame for better interpretation.

Sort DataFrame in place

As briefly mentioned above, the “sort_values()” function does not modify the original DataFrame but only creates a view. However, in case we want to modify the original DataFrame with the sorted view, we can use the “inplace” argument.

# sort with inplace
df.sort_values(by = 'Experience', ascending= False, inplace=True)
print (df)

Let’s see the complete example,

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

# sort with inplace
df.sort_values(by = 'Experience', ascending= False, inplace=True)

print (df)

Output

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

      Name        Designation    Team  Experience
4     Aadi  Backend Developer    Tech          11
1     Riti      Data Engineer    Tech           7
0  Shubham     Data Scientist    Tech           5
5      Sim      Data Engineer    Tech           4
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2

Here you go, now the DataFrame contains values sorted by the column “Experience”.

Summary

In this article, we have discussed how to sort the DataFrame using a single column. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top