In this article, we will discuss that in pandas, how to convert an existing column of a Dataframe to an index, and also various scenarios associated with it.
Pandas Dataframe class provides a function set_index (). Let’s first have a look at that,
DataFrame.set_index()
DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)
It accepts column names in the keys argument and sets them as the index of dataframe.
Important arguments are,
Keys: Single or multiple column names, which we want to set as an index of dataframe
- drop:Â bool, default True
- If True, then deletes the column after converting it as an index, i.e., move column to index.
- Where if it is False, then copies the column to index, i.e., doesn’t delete the column.
- append:Â bool, default False
- If True then adds the given column to the existing index, whereas if passed as False, then replaces the current Index.
- inplace:Â bool, default False
- If passed as True then makes changes in the calling dataframe object otherwise if it is False, then returns a copy of modified dataframe
- verify_integrity:Â bool, default False
- If True, then check for the duplicate entries in the new index.
We will use this function to convert columns of a dataframe into an index of the dataframe.
Frequently Asked:
- Create an empty DataFrame with just column names
- Pandas Tutorial #14 – Sorting DataFrame
- Convert Column Values to Lowercase in Pandas Dataframe
- Pandas: Select last N columns of dataframe
For our examples, we will create a dataframe from a list of tuples, i.e.
import pandas as pd # List of Tuples empoyees = [('jack', 34, 'Sydney', 70000), ('Riti', 31, 'Delhi', 77000), ('Aadi', 16, 'Mumbai', 81000), ('Mohit', 31, 'Delhi', 90000), ('Veena', 12, 'Delhi', 91000), ('Shaunak', 35, 'Mumbai', 75000), ('Mark', 35, 'Colombo', 63000) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Salary']) # Rename index of dataframe to 'ID' empDfObj.index.rename('ID', inplace=True)
Contents of the dataframe empDFObj are as follows,
Name Age City Salary ID 0 jack 34 Sydney 70000 1 Riti 31 Delhi 77000 2 Aadi 16 Mumbai 81000 3 Mohit 31 Delhi 90000 4 Veena 12 Delhi 91000 5 Shaunak 35 Mumbai 75000 6 Mark 35 Colombo 63000
This dataframe has a default index, and we named it as ID. What if we want to make any other column as the index of this dataframe?
Convert a column of Dataframe into an index of the Dataframe
Suppose we want to convert the column ‘Name’ into the index of the dataframe, for that we need to pass the column name in the set_index() function of the dataframe i.e.
# set column 'Name' as the index of the Dataframe modifiedDF = empDfObj.set_index('Name') print('Modified Dataframe :') print(modifiedDF)
Output
Modified Dataframe : Age City Salary Name jack 34 Sydney 70000 Riti 31 Delhi 77000 Aadi 16 Mumbai 81000 Mohit 31 Delhi 90000 Veena 12 Delhi 91000 Shaunak 35 Mumbai 75000 Mark 35 Colombo 63000
Here set_default() returned a copy of the dataframe with modified contents, in which column ‘Name’ gets converted to the index of the dataframe, and the old index gets deleted. But it didn’t modify the original dataframe, it just copied the dataframe, made changes in that and returned the modified copy of dataframe.
Convert a column of Dataframe into index without deleting the column
In the above example column, ‘Name’ is converted to the index of dataframe, i.e., column ‘Name’ no longer exists after that. What if we want to keep the column ‘Name’ as it is but wants it as index too. For that we need to pass the drop argument as False in the set_index() function, i.e.
# set copy of column 'Name' as the index of the Dataframe modifiedDF = empDfObj.set_index('Name', drop=False) print('Modified Dataframe') print(modifiedDF)
Output:
Modified Dataframe Name Age City Salary Name jack jack 34 Sydney 70000 Riti Riti 31 Delhi 77000 Aadi Aadi 16 Mumbai 81000 Mohit Mohit 31 Delhi 90000 Veena Veena 12 Delhi 91000 Shaunak Shaunak 35 Mumbai 75000 Mark Mark 35 Colombo 63000
In the returned copy of the dataframe. In which a copy of the column ‘Name’ is now an index of the dataframe, but column ‘Name’ still exists in that dataframe.
Append a Dataframe column of into index to make it Multi-Index Dataframe
In both the above examples, we set the ‘Name’ column as an index of dataframe, but it replaced the old column  ‘ID’ from the dataframe. What if we want to keep the index ‘ID’ as it is but append another index into it by converting column ‘Name’ into to index.
For that, we need to pass the append argument as True in the set_index() function i.e.
# Append column 'Name' to the existing index of dataframe # to make it multi-index dataframe modifiedDF = empDfObj.set_index('Name', append=True) print('Modified Dataframe') print(modifiedDF)
Output
Modified Dataframe Age City Salary ID Name 0 jack 34 Sydney 70000 1 Riti 31 Delhi 77000 2 Aadi 16 Mumbai 81000 3 Mohit 31 Delhi 90000 4 Veena 12 Delhi 91000 5 Shaunak 35 Mumbai 75000 6 Mark 35 Colombo 63000
Dataframe is now a multi-index dataframe with two indexes, i.e. ID & Name.
Check for duplicates in the new index
If you want to make sure that after converting column to the index, our index does not contain any duplicate value, then pass argument verify_integrity as True in the set_index() function, i.e.
# check for duplicates in the new index modifiedDF = empDfObj.set_index('City', verify_integrity=True) print('Modified Dataframe') print(modifiedDF)
It will make sure that if our new index contains any duplicate value, then set_index()will raise an error like this,
ValueError: Index has duplicate keys:
As City column contains the duplicates, therefore it will raise error.
Modify existing Dataframe by converting into index
In the examples, till now, we saw the set_index() returns a copy of the original dataframe with modifications. What if we want to make changes in the existing dataframe? Well, we can do that in two ways,
First way,
empDfObj = empDfObj.set_index('Name')
Assign returned dataframe object to the original variable and now the variable points to the updated dataframe
Second way,
Pass argument in place as True. It makes the changes in existing dataframe, i.e.
empDfObj.set_index('Name', inplace=True)
In both, the contents of both the above statements, the dataframe empDFObj got modified, and column ‘Name’ got converted to the index of the dataframe i.e.
print('Original Dataframe contents :') print(empDfObj)
Output:
Original Dataframe contents : Age City Salary Name jack 34 Sydney 70000 Riti 31 Delhi 77000 Aadi 16 Mumbai 81000 Mohit 31 Delhi 90000 Veena 12 Delhi 91000 Shaunak 35 Mumbai 75000 Mark 35 Colombo 63000
The complete example is as follows,
import pandas as pd def main(): # List of Tuples empoyees = [('jack', 34, 'Sydney', 70000), ('Riti', 31, 'Delhi', 77000), ('Aadi', 16, 'Mumbai', 81000), ('Mohit', 31, 'Delhi', 90000), ('Veena', 12, 'Delhi', 91000), ('Shaunak', 35, 'Mumbai', 75000), ('Mark', 35, 'Colombo', 63000) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Salary']) # Rename index of dataframe to 'ID' empDfObj.index.rename('ID', inplace=True) print("Contents of the Dataframe : ") print(empDfObj) print('*** Convert a column of Dataframe into index of the Dataframe ***') # set column 'Name' as the index of the Dataframe modifiedDF = empDfObj.set_index('Name') print('Modified Dataframe :') print(modifiedDF) print('*** Convert a column of Dataframe into index without deleting the column ***') # set copy of column 'Name' as the index of the Dataframe modifiedDF = empDfObj.set_index('Name', drop=False) print('Modified Dataframe') print(modifiedDF) print('*** Append a Dataframe column of into index to make it Multi-Index Dataframe ***') # Append column 'Name' to the existing index of dataframe # to make it multi-index dataframe modifiedDF = empDfObj.set_index('Name', append=True) print('Modified Dataframe') print(modifiedDF) print('*** While converting column to index, check for duplicates in the new index ***') # check for duplicates in the new index modifiedDF = empDfObj.set_index('Name', verify_integrity=True) print('Modified Dataframe') print(modifiedDF) print('*** Modify existing Dataframe by converting into index ***') empDfObj.set_index('Name', inplace=True) print('Original Dataframe contents :') print(empDfObj) if __name__ == '__main__': main()
Output:
Contents of the Dataframe : Name Age City Salary ID 0 jack 34 Sydney 70000 1 Riti 31 Delhi 77000 2 Aadi 16 Mumbai 81000 3 Mohit 31 Delhi 90000 4 Veena 12 Delhi 91000 5 Shaunak 35 Mumbai 75000 6 Mark 35 Colombo 63000 *** Convert a column of Dataframe into index of the Dataframe *** Modified Dataframe : Age City Salary Name jack 34 Sydney 70000 Riti 31 Delhi 77000 Aadi 16 Mumbai 81000 Mohit 31 Delhi 90000 Veena 12 Delhi 91000 Shaunak 35 Mumbai 75000 Mark 35 Colombo 63000 *** Convert a column of Dataframe into index without deleting the column *** Modified Dataframe Name Age City Salary Name jack jack 34 Sydney 70000 Riti Riti 31 Delhi 77000 Aadi Aadi 16 Mumbai 81000 Mohit Mohit 31 Delhi 90000 Veena Veena 12 Delhi 91000 Shaunak Shaunak 35 Mumbai 75000 Mark Mark 35 Colombo 63000 *** Append a Dataframe column of into index to make it Multi-Index Dataframe *** Modified Dataframe Age City Salary ID Name 0 jack 34 Sydney 70000 1 Riti 31 Delhi 77000 2 Aadi 16 Mumbai 81000 3 Mohit 31 Delhi 90000 4 Veena 12 Delhi 91000 5 Shaunak 35 Mumbai 75000 6 Mark 35 Colombo 63000 *** While converting column to index, check for duplicates in the new index *** Modified Dataframe Age City Salary Name jack 34 Sydney 70000 Riti 31 Delhi 77000 Aadi 16 Mumbai 81000 Mohit 31 Delhi 90000 Veena 12 Delhi 91000 Shaunak 35 Mumbai 75000 Mark 35 Colombo 63000 *** Modify existing Dataframe by converting into index *** Original Dataframe contents : Age City Salary Name jack 34 Sydney 70000 Riti 31 Delhi 77000 Aadi 16 Mumbai 81000 Mohit 31 Delhi 90000 Veena 12 Delhi 91000 Shaunak 35 Mumbai 75000 Mark 35 Colombo 63000