Get Column Index from Column Name in Pandas DataFrame

In this article, we will discuss different ways to get the column index position from the name of column in a Pandas DataFrame.

Table of Contents:

In Python, the Pandas module provides a data structure DataFrame. It stores the data in tabular format i.e. in the format of rows and columns. Let’s create a DataFrame from a list of tuples in python,

import pandas as pd

# List of Tuples
students = [('Mark',    24, 'Berlin',    'Germany',        89000),
            ('Rita',    20, 'Seoul',     'South Korea',    93000),
            ('Vicki',   21, 'Amsterdam', 'Netherlands',    95670),
            ('Justin',  22, 'Singapore', 'Singapore',      78900),
            ('John',    36, 'Paris',     'France',         98711),
            ('Michal',  37, 'London',    'United Kingdom', 90000)]

# Create a DataFrame object
df = pd.DataFrame( students,
                   columns =['Name', 'Age', 'City', 'Country', 'Budget'],
                   index =['a', 'b', 'c', 'd', 'e', 'f'])

# Display the DataFrame
print(df)

Output:

     Name  Age       City         Country  Budget
a    Mark   24     Berlin         Germany   89000
b    Rita   20      Seoul     South Korea   93000
c   Vicki   21  Amsterdam     Netherlands   95670
d  Justin   22  Singapore       Singapore   78900
e    John   36      Paris          France   98711
f  Michal   37     London  United Kingdom   90000

This DataFrame contains five columns and six rows. Each of the column has a column name associated with it. Now suppose we want to know the column index position based on its name. For example,

Advertisements
  • Column at index position 0 has label ‘Name’
  • Column at index position 1 has label ‘Age’
  • Column at index position 2 has label ‘City’
  • Column at index position 3 has label ‘Country’
  • Column at index position 4 has label ‘Budget’

Now let’s how to get the column index position by its name.

Get column index from column name in DataFrame using get_loc()

In Pandas, the DataFrame class provides an attribute columns, which gives us an Index object containing all the column names of the DataFrame. The Index object has a function get_loc(label), which returns the index position based on the label. If given label doesn’t exist in the Index, then it raises the KeyError. We can use columns attribute and get_loc() function to get the column index from its name. For example,

# Get column index position of column 'City'
col_index = df.columns.get_loc('City')

print(col_index)

Output:

2

It returned the column index position of column ‘City’ from the DataFrame i.e. 2.

What of column name does not exists in the DataFrame?

If the given column name does not exist in the DataFrame, then the get_loc() function will return KeyError. For example,

# Get column index position of column 'Town'
col_index = df.columns.get_loc('Town')

print(col_index)

Error:

Traceback (most recent call last):
  File ".\temp.py", line 20, in <module>
    col_index = df.columns.get_loc('Town')
  File "C:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc      
  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Town'

As there was no column named ‘Town’ in the DataFrame, therefore it raised the KeyError. We can avoid this either by using try/except or by first checking if column with given name exist or not. For example,

if 'Town' in df.columns:
    # Get column index position of column 'Town'
    col_index = df.columns.get_loc('Town')
    print(col_index)
else:
    print('Column does not exist in the DataFrame')

Output:

Column does not exist in the DataFrame

This way we can avoid the erroneous scenarios.

Get column index from column name in DataFrame using list()

The columns attribute of the DataFrame gives an Index object containing the column names. If we pass that to the list() function, it will give us a list of DataFrame column names. Then using the index() function of list, we can get the index position of column by its name. For example,

import pandas as pd

# List of Tuples
students = [('Mark',    24, 'Berlin',    'Germany',        89000),
            ('Rita',    20, 'Seoul',     'South Korea',    93000),
            ('Vicki',   21, 'Amsterdam', 'Netherlands',    95670),
            ('Justin',  22, 'Singapore', 'Singapore',      78900),
            ('John',    36, 'Paris',     'France',         98711),
            ('Michal',  37, 'London',    'United Kingdom', 90000)]

# Create a DataFrame object
df = pd.DataFrame( students,
                   columns =['Name', 'Age', 'City', 'Country', 'Budget'],
                   index =['a', 'b', 'c', 'd', 'e', 'f'])

# Display the DataFrame
print(df)

# Get column index position of column 'City'
col_index = list(df.columns).index('City')

print("Index position of column 'City' is ", col_index)

Output:

     Name  Age       City         Country  Budget
a    Mark   24     Berlin         Germany   89000
b    Rita   20      Seoul     South Korea   93000
c   Vicki   21  Amsterdam     Netherlands   95670
d  Justin   22  Singapore       Singapore   78900
e    John   36      Paris          France   98711
f  Michal   37     London  United Kingdom   90000


Index position of column 'City' is  2  

It returned the column index position of column ‘City’ from the DataFrame i.e. 2.

What of column name does not exists in the DataFrame?

If the given column name does not exist in the DataFrame, then the index() function will return ValueError. For example,

# Get column index position of column 'Town'
col_index = list(df.columns).index('Town')

print("Index position of column 'Town' is ", col_index)

Error:

Traceback (most recent call last):
  File ".\temp.py", line 20, in <module>
    col_index = list(df.columns).index('Town')
ValueError: 'Town' is not in list

As there was no column named ‘Town’ in the DataFrame, therefore it raised the ValueError. We can avoid this either by using try/except or by first checking if column with given name exist or not. For example,

colum_names = list(df.columns)
if 'Town' in colum_names:
    # Get column index position of column 'Town'
    col_index = colum_names.index('Town')
    print("Index position of column 'Town' is ", col_index)
else:
    print('Column does not exist in the DataFrame')

Output

Column does not exist in the DataFrame

This way we can avoid ValueError.

Summary:

We learned about two different techniques to get the column index position by the column name.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top