In this article, we will discuss that in pandas, how to convert an existing column of a Dataframe to an index, and also various scenarios associated with it.

Pandas Dataframe class provides a function set_index (). Let’s first have a look at that,

DataFrame.set_index()

It accepts column names in the keys argument and sets them as the index of dataframe.

Important arguments are,

Keys: Single or multiple column names, which we want to set as an index of dataframe

  • drop: bool, default True
    • If True, then deletes the column after converting it as an index, i.e., move column to index.
    • Where if it is False, then copies the column to index, i.e., doesn’t delete the column.
  • append: bool, default False
    • If True then adds the given column to the existing index, whereas if passed as False, then replaces the current Index.
  • inplace: bool, default False
    • If passed as True then makes changes in the calling dataframe object otherwise if it is False, then returns a copy of modified dataframe
  • verify_integrity: bool, default False
    • If True, then check for the duplicate entries in the new index.

We will use this function to convert columns of a dataframe into an index of the dataframe.

For our examples, we will create a dataframe from a list of tuples, i.e.

Contents of the dataframe empDFObj are as follows,

This dataframe has a default index, and we named it as ID. What if we want to make any other column as the index of this dataframe?

Convert a column of Dataframe into an index of the Dataframe

Suppose we want to convert the column ‘Name’ into the index of the dataframe, for that we need to pass the column name in the set_index() function of the dataframe i.e.

Output

Here set_default() returned a copy of the dataframe with modified contents, in which column ‘Name’ gets converted to the index of the dataframe, and the old index gets deleted. But it didn’t modify the original dataframe, it just copied the dataframe, made changes in that and returned the modified copy of dataframe.

Convert a column of Dataframe into index without deleting the column

In the above example column, ‘Name’ is converted to the index of dataframe, i.e., column ‘Name’ no longer exists after that. What if we want to keep the column ‘Name’ as it is but wants it as index too. For that we need to pass the drop argument as False in the set_index() function, i.e.

Output:

In the returned copy of the dataframe. In which a copy of the column ‘Name’ is now an index of the dataframe, but column ‘Name’ still exists in that dataframe.

Append a Dataframe column of into index to make it Multi-Index Dataframe

In both the above examples, we set the ‘Name’ column as an index of dataframe, but it replaced the old column  ‘ID’ from the dataframe. What if we want to keep the index ‘ID’ as it is but append another index into it by converting column ‘Name’ into to index.

For that, we need to pass the append argument as True in the set_index() function i.e.

Output

Dataframe is now a multi-index dataframe with two indexes, i.e. ID & Name.

Check for duplicates in the new index

If you want to make sure that after converting column to the index, our index does not contain any duplicate value, then pass argument verify_integrity as True in the set_index() function, i.e.

It will make sure that if our new index contains any duplicate value, then set_index()will raise an error like this,

As City column contains the duplicates, therefore it will raise error.

Modify existing Dataframe by converting into index

In the examples, till now, we saw the set_index() returns a copy of the original dataframe with modifications. What if we want to make changes in the existing dataframe? Well, we can do that in two ways,

First way,

Assign returned dataframe object to the original variable and now the variable points to the updated dataframe

Second way,

Pass argument in place as True. It makes the changes in existing dataframe, i.e.

In both, the contents of both the above statements, the dataframe empDFObj got modified, and column ‘Name’ got converted to the index of the dataframe i.e.

Output:

The complete example is as follows,

Output:

 

Python Resources

C++11 / C++14 Resources

Design Patterns Resources

If you didn't find what you were looking, then do suggest us in the comments below. We will be more than happy to add that.

Do Subscribe with us for more Articles / Tutorials like this,