Pandas : Convert Dataframe column into an index using set_index() in Python

In this article, we will discuss that in pandas, how to convert an existing column of a Dataframe to an index, and also various scenarios associated with it.

Pandas Dataframe class provides a function set_index (). Let’s first have a look at that,

DataFrame.set_index()

DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)

It accepts column names in the keys argument and sets them as the index of dataframe.

Important arguments are,

Keys: Single or multiple column names, which we want to set as an index of dataframe

  • drop: bool, default True
    • If True, then deletes the column after converting it as an index, i.e., move column to index.
    • Where if it is False, then copies the column to index, i.e., doesn’t delete the column.
  • append: bool, default False
    • If True then adds the given column to the existing index, whereas if passed as False, then replaces the current Index.
  • inplace: bool, default False
    • If passed as True then makes changes in the calling dataframe object otherwise if it is False, then returns a copy of modified dataframe
  • verify_integrity: bool, default False
    • If True, then check for the duplicate entries in the new index.

We will use this function to convert columns of a dataframe into an index of the dataframe.

For our examples, we will create a dataframe from a list of tuples, i.e.

import pandas as pd

# List of Tuples
empoyees = [('jack', 34, 'Sydney', 70000),
            ('Riti', 31, 'Delhi', 77000),
            ('Aadi', 16, 'Mumbai', 81000),
            ('Mohit', 31, 'Delhi', 90000),
            ('Veena', 12, 'Delhi', 91000),
            ('Shaunak', 35, 'Mumbai', 75000),
            ('Mark', 35, 'Colombo', 63000)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Salary'])
# Rename index of dataframe to 'ID'
empDfObj.index.rename('ID', inplace=True)

Contents of the dataframe empDFObj are as follows,

       Name  Age     City  Salary
ID                               
0      jack   34   Sydney   70000
1      Riti   31    Delhi   77000
2      Aadi   16   Mumbai   81000
3     Mohit   31    Delhi   90000
4     Veena   12    Delhi   91000
5   Shaunak   35   Mumbai   75000
6      Mark   35  Colombo   63000

This dataframe has a default index, and we named it as ID. What if we want to make any other column as the index of this dataframe?

Convert a column of Dataframe into an index of the Dataframe

Suppose we want to convert the column ‘Name’ into the index of the dataframe, for that we need to pass the column name in the set_index() function of the dataframe i.e.

# set column 'Name' as the index of the Dataframe
modifiedDF = empDfObj.set_index('Name')

print('Modified Dataframe :')
print(modifiedDF)

Output

Modified Dataframe :
         Age     City  Salary
Name                         
jack      34   Sydney   70000
Riti      31    Delhi   77000
Aadi      16   Mumbai   81000
Mohit     31    Delhi   90000
Veena     12    Delhi   91000
Shaunak   35   Mumbai   75000
Mark      35  Colombo   63000

Here set_default() returned a copy of the dataframe with modified contents, in which column ‘Name’ gets converted to the index of the dataframe, and the old index gets deleted. But it didn’t modify the original dataframe, it just copied the dataframe, made changes in that and returned the modified copy of dataframe.

Convert a column of Dataframe into index without deleting the column

In the above example column, ‘Name’ is converted to the index of dataframe, i.e., column ‘Name’ no longer exists after that. What if we want to keep the column ‘Name’ as it is but wants it as index too. For that we need to pass the drop argument as False in the set_index() function, i.e.

# set copy of column 'Name' as the index of the Dataframe
modifiedDF = empDfObj.set_index('Name', drop=False)

print('Modified Dataframe')
print(modifiedDF)

Output:

Modified Dataframe
            Name  Age     City  Salary
Name                                  
jack        jack   34   Sydney   70000
Riti        Riti   31    Delhi   77000
Aadi        Aadi   16   Mumbai   81000
Mohit      Mohit   31    Delhi   90000
Veena      Veena   12    Delhi   91000
Shaunak  Shaunak   35   Mumbai   75000
Mark        Mark   35  Colombo   63000

In the returned copy of the dataframe. In which a copy of the column ‘Name’ is now an index of the dataframe, but column ‘Name’ still exists in that dataframe.

Append a Dataframe column of into index to make it Multi-Index Dataframe

In both the above examples, we set the ‘Name’ column as an index of dataframe, but it replaced the old column  ‘ID’ from the dataframe. What if we want to keep the index ‘ID’ as it is but append another index into it by converting column ‘Name’ into to index.

For that, we need to pass the append argument as True in the set_index() function i.e.

# Append column 'Name' to the existing index of dataframe
# to make it multi-index dataframe
modifiedDF = empDfObj.set_index('Name', append=True)

print('Modified Dataframe')
print(modifiedDF)

Output

Modified Dataframe
            Age     City  Salary
ID Name                         
0  jack      34   Sydney   70000
1  Riti      31    Delhi   77000
2  Aadi      16   Mumbai   81000
3  Mohit     31    Delhi   90000
4  Veena     12    Delhi   91000
5  Shaunak   35   Mumbai   75000
6  Mark      35  Colombo   63000

Dataframe is now a multi-index dataframe with two indexes, i.e. ID & Name.

Check for duplicates in the new index

If you want to make sure that after converting column to the index, our index does not contain any duplicate value, then pass argument verify_integrity as True in the set_index() function, i.e.

# check for duplicates in the new index
modifiedDF = empDfObj.set_index('City', verify_integrity=True)

print('Modified Dataframe')
print(modifiedDF)

It will make sure that if our new index contains any duplicate value, then set_index()will raise an error like this,

ValueError: Index has duplicate keys:

As City column contains the duplicates, therefore it will raise error.

Modify existing Dataframe by converting into index

In the examples, till now, we saw the set_index() returns a copy of the original dataframe with modifications. What if we want to make changes in the existing dataframe? Well, we can do that in two ways,

First way,

empDfObj  = empDfObj.set_index('Name')

Assign returned dataframe object to the original variable and now the variable points to the updated dataframe

Second way,

Pass argument in place as True. It makes the changes in existing dataframe, i.e.

empDfObj.set_index('Name', inplace=True)

In both, the contents of both the above statements, the dataframe empDFObj got modified, and column ‘Name’ got converted to the index of the dataframe i.e.

print('Original Dataframe contents :')
print(empDfObj)

Output:

Original Dataframe contents :
         Age     City  Salary
Name                         
jack      34   Sydney   70000
Riti      31    Delhi   77000
Aadi      16   Mumbai   81000
Mohit     31    Delhi   90000
Veena     12    Delhi   91000
Shaunak   35   Mumbai   75000
Mark      35  Colombo   63000

The complete example is as follows,

import pandas as pd


def main():
    # List of Tuples
    empoyees = [('jack', 34, 'Sydney', 70000),
                ('Riti', 31, 'Delhi', 77000),
                ('Aadi', 16, 'Mumbai', 81000),
                ('Mohit', 31, 'Delhi', 90000),
                ('Veena', 12, 'Delhi', 91000),
                ('Shaunak', 35, 'Mumbai', 75000),
                ('Mark', 35, 'Colombo', 63000)
                ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Salary'])
    # Rename index of dataframe to 'ID'
    empDfObj.index.rename('ID', inplace=True)

    print("Contents of the Dataframe : ")
    print(empDfObj)

    print('*** Convert a column of Dataframe into index of the Dataframe ***')

    # set column 'Name' as the index of the Dataframe
    modifiedDF = empDfObj.set_index('Name')

    print('Modified Dataframe :')
    print(modifiedDF)

    print('*** Convert a column of Dataframe into index without deleting the column ***')

    # set copy of column 'Name' as the index of the Dataframe
    modifiedDF = empDfObj.set_index('Name', drop=False)

    print('Modified Dataframe')
    print(modifiedDF)

    print('*** Append a Dataframe column of into index to make it Multi-Index Dataframe  ***')

    # Append column 'Name' to the existing index of dataframe
    # to make it multi-index dataframe
    modifiedDF = empDfObj.set_index('Name', append=True)

    print('Modified Dataframe')
    print(modifiedDF)

    print('*** While converting column to index, check for duplicates in the new index ***')

    # check for duplicates in the new index
    modifiedDF = empDfObj.set_index('Name', verify_integrity=True)
    print('Modified Dataframe')
    print(modifiedDF)

    print('*** Modify existing Dataframe by converting into index ***')

    empDfObj.set_index('Name', inplace=True)

    print('Original Dataframe contents :')
    print(empDfObj)

if __name__ == '__main__':
    main()

Output:

Contents of the Dataframe : 
       Name  Age     City  Salary
ID                               
0      jack   34   Sydney   70000
1      Riti   31    Delhi   77000
2      Aadi   16   Mumbai   81000
3     Mohit   31    Delhi   90000
4     Veena   12    Delhi   91000
5   Shaunak   35   Mumbai   75000
6      Mark   35  Colombo   63000
*** Convert a column of Dataframe into index of the Dataframe ***
Modified Dataframe :
         Age     City  Salary
Name                         
jack      34   Sydney   70000
Riti      31    Delhi   77000
Aadi      16   Mumbai   81000
Mohit     31    Delhi   90000
Veena     12    Delhi   91000
Shaunak   35   Mumbai   75000
Mark      35  Colombo   63000
*** Convert a column of Dataframe into index without deleting the column ***
Modified Dataframe
            Name  Age     City  Salary
Name                                  
jack        jack   34   Sydney   70000
Riti        Riti   31    Delhi   77000
Aadi        Aadi   16   Mumbai   81000
Mohit      Mohit   31    Delhi   90000
Veena      Veena   12    Delhi   91000
Shaunak  Shaunak   35   Mumbai   75000
Mark        Mark   35  Colombo   63000
*** Append a Dataframe column of into index to make it Multi-Index Dataframe  ***
Modified Dataframe
            Age     City  Salary
ID Name                         
0  jack      34   Sydney   70000
1  Riti      31    Delhi   77000
2  Aadi      16   Mumbai   81000
3  Mohit     31    Delhi   90000
4  Veena     12    Delhi   91000
5  Shaunak   35   Mumbai   75000
6  Mark      35  Colombo   63000
*** While converting column to index, check for duplicates in the new index ***
Modified Dataframe
         Age     City  Salary
Name                         
jack      34   Sydney   70000
Riti      31    Delhi   77000
Aadi      16   Mumbai   81000
Mohit     31    Delhi   90000
Veena     12    Delhi   91000
Shaunak   35   Mumbai   75000
Mark      35  Colombo   63000
*** Modify existing Dataframe by converting into index ***
Original Dataframe contents :
         Age     City  Salary
Name                         
jack      34   Sydney   70000
Riti      31    Delhi   77000
Aadi      16   Mumbai   81000
Mohit     31    Delhi   90000
Veena     12    Delhi   91000
Shaunak   35   Mumbai   75000
Mark      35  Colombo   63000

 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top