In this article, we will discuss how to use the sort_values() function to sort a pandas DataFrame by one column. Although, the sort_values() is very simple to use, but we will try to cover multiple situation.
Table of Contents
Preparing dataset for solution
To quickly get started, let’s create a sample DataFrame to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees= [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience'], index=[0, 1, 2, 3, 4, 5]) print(df)
Contents of the created dataframe are,
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4
Sorting DataFrame based on a single column
As mentioned above, we are going to use the “sort_values()” function. In this, we just need to pass a single argument “by”, containing the column name. Using this column name, it will sort the DataFrame. Let’s take an example below, where we need to sort the above DataFrame on the “Experience” column.
# sort values by Experience column print(df.sort_values(by = 'Experience'))
Output
Name Designation Team Experience 2 Shanky Program Manager NaN 2 3 Shreya Graphic Designer Design 2 5 Sim Data Engineer NaN 4 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 4 Aadi Backend Developer Tech 11
This sorts our DataFrame using the column values from “Experience”, showing the employees with the lowest experience first. Note that, it does not modify the original DataFrame but only creates a view.
Frequently Asked:
- Pandas – Select Column by Name
- Count unique values per group in Pandas
- Add a header to a CSV file in Python
- Pandas: Set value of a Cell in Dataframe
Let’s see the complete example,
import pandas as pd import numpy as np # List of Tuples employees= [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience'], index=[0, 1, 2, 3, 4, 5]) print(df) # sort values by Experience column print(df.sort_values(by = 'Experience'))
Output
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4 Name Designation Team Experience 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 5 Sim Data Engineer Tech 4 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 4 Aadi Backend Developer Tech 11
Sorting DataFrame in Descending Order based on a single column
By default, the sort_values() function sorts the data in ascending order. In case we want to change it, we can use the “ascending” argument as shown below.
# sort values by Experience column print(df.sort_values(by = 'Experience', ascending=False))
Output
Name Designation Team Experience 4 Aadi Backend Developer Tech 11 1 Riti Data Engineer Tech 7 0 Shubham Data Scientist Tech 5 5 Sim Data Engineer NaN 4 2 Shanky Program Manager NaN 2 3 Shreya Graphic Designer Design 2
As observed, the DataFrame is now sorted on the “Experience” column in descending order.
Let’s see the complete example,
import pandas as pd import numpy as np # List of Tuples employees= [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience'], index=[0, 1, 2, 3, 4, 5]) print(df) # sort values by Experience column print(df.sort_values(by = 'Experience', ascending=False))
Output
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4 Name Designation Team Experience 4 Aadi Backend Developer Tech 11 1 Riti Data Engineer Tech 7 0 Shubham Data Scientist Tech 5 5 Sim Data Engineer Tech 4 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2
Sorting DataFrame and Handling missing values
Sorting columns containing missing values can be challenging. The sort_values function automatically detects the missing values and sorts them at the bottom of the DataFrame. For better understanding, let’s quickly try sorting our DataFrame on the “Team” column which contains some missing values.
# sort Dataframe on Team column print (df.sort_values(by = 'Team'))
Let’s see the complete example,
import pandas as pd import numpy as np # List of Tuples employees= [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', np.NaN , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', np.NaN, 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience'], index=[0, 1, 2, 3, 4, 5]) print(df) # sort Dataframe on Team column print (df.sort_values(by = 'Team'))
Output
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer NaN 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer NaN 4 Name Designation Team Experience 3 Shreya Graphic Designer Design 2 2 Shanky Program Manager PMO 2 0 Shubham Data Scientist Tech 5 4 Aadi Backend Developer Tech 11 1 Riti Data Engineer NaN 7 5 Sim Data Engineer NaN 4
As expected, it sorted all the values from the “Team” column in ascending order but kept all the missing values at the bottom of the DataFrame. We can bring these null values at the top using the “na_position” argument as shown below.
# sort Dataframe on Team column print (df.sort_values(by = 'Team', na_position='first'))
Output
Name Designation Team Experience 1 Riti Data Engineer NaN 7 5 Sim Data Engineer NaN 4 3 Shreya Graphic Designer Design 2 2 Shanky Program Manager PMO 2 0 Shubham Data Scientist Tech 5 4 Aadi Backend Developer Tech 11
The column is still sorted ascending on the “Team” column but now the missing values are at the top of the DataFrame. However, it is recommended to either impute or drop these missing values before sorting the DataFrame for better interpretation.
Sort DataFrame in place
As briefly mentioned above, the “sort_values()” function does not modify the original DataFrame but only creates a view. However, in case we want to modify the original DataFrame with the sorted view, we can use the “inplace” argument.
# sort with inplace df.sort_values(by = 'Experience', ascending= False, inplace=True) print (df)
Let’s see the complete example,
import pandas as pd import numpy as np # List of Tuples employees= [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience'], index=[0, 1, 2, 3, 4, 5]) print(df) # sort with inplace df.sort_values(by = 'Experience', ascending= False, inplace=True) print (df)
Output
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4 Name Designation Team Experience 4 Aadi Backend Developer Tech 11 1 Riti Data Engineer Tech 7 0 Shubham Data Scientist Tech 5 5 Sim Data Engineer Tech 4 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2
Here you go, now the DataFrame contains values sorted by the column “Experience”.
Summary
In this article, we have discussed how to sort the DataFrame using a single column. Thanks.