In this article, we will discuss different ways to get the column index position from the name of column in a Pandas DataFrame.
Table of Contents:
- Get column index from column name in DataFrame using get_loc()
- Get column index from column name in DataFrame using list()
In Python, the Pandas module provides a data structure DataFrame. It stores the data in tabular format i.e. in the format of rows and columns. Let’s create a DataFrame from a list of tuples in python,
import pandas as pd # List of Tuples students = [('Mark', 24, 'Berlin', 'Germany', 89000), ('Rita', 20, 'Seoul', 'South Korea', 93000), ('Vicki', 21, 'Amsterdam', 'Netherlands', 95670), ('Justin', 22, 'Singapore', 'Singapore', 78900), ('John', 36, 'Paris', 'France', 98711), ('Michal', 37, 'London', 'United Kingdom', 90000)] # Create a DataFrame object df = pd.DataFrame( students, columns =['Name', 'Age', 'City', 'Country', 'Budget'], index =['a', 'b', 'c', 'd', 'e', 'f']) # Display the DataFrame print(df)
Output:
Name Age City Country Budget a Mark 24 Berlin Germany 89000 b Rita 20 Seoul South Korea 93000 c Vicki 21 Amsterdam Netherlands 95670 d Justin 22 Singapore Singapore 78900 e John 36 Paris France 98711 f Michal 37 London United Kingdom 90000
This DataFrame contains five columns and six rows. Each of the column has a column name associated with it. Now suppose we want to know the column index position based on its name. For example,
- Column at index position 0 has label ‘Name’
- Column at index position 1 has label ‘Age’
- Column at index position 2 has label ‘City’
- Column at index position 3 has label ‘Country’
- Column at index position 4 has label ‘Budget’
Now let’s how to get the column index position by its name.
Get column index from column name in DataFrame using get_loc()
In Pandas, the DataFrame class provides an attribute columns, which gives us an Index object containing all the column names of the DataFrame. The Index object has a function get_loc(label), which returns the index position based on the label. If given label doesn’t exist in the Index, then it raises the KeyError. We can use columns attribute and get_loc() function to get the column index from its name. For example,
Frequently Asked:
- Pandas Tutorial #3 – Get & Set Series values
- Replace NaN with values from another DataFrame in Pandas
- Convert JSON to a Pandas Dataframe
- Pandas : Convert Dataframe column into an index using set_index() in Python
# Get column index position of column 'City' col_index = df.columns.get_loc('City') print(col_index)
Output:
2
It returned the column index position of column ‘City’ from the DataFrame i.e. 2.
What of column name does not exists in the DataFrame?
If the given column name does not exist in the DataFrame, then the get_loc() function will return KeyError. For example,
# Get column index position of column 'Town' col_index = df.columns.get_loc('Town') print(col_index)
Error:
Traceback (most recent call last): File ".\temp.py", line 20, in <module> col_index = df.columns.get_loc('Town') File "C:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Town'
As there was no column named ‘Town’ in the DataFrame, therefore it raised the KeyError. We can avoid this either by using try/except or by first checking if column with given name exist or not. For example,
if 'Town' in df.columns: # Get column index position of column 'Town' col_index = df.columns.get_loc('Town') print(col_index) else: print('Column does not exist in the DataFrame')
Output:
Column does not exist in the DataFrame
This way we can avoid the erroneous scenarios.
Get column index from column name in DataFrame using list()
The columns attribute of the DataFrame gives an Index object containing the column names. If we pass that to the list() function, it will give us a list of DataFrame column names. Then using the index() function of list, we can get the index position of column by its name. For example,
import pandas as pd # List of Tuples students = [('Mark', 24, 'Berlin', 'Germany', 89000), ('Rita', 20, 'Seoul', 'South Korea', 93000), ('Vicki', 21, 'Amsterdam', 'Netherlands', 95670), ('Justin', 22, 'Singapore', 'Singapore', 78900), ('John', 36, 'Paris', 'France', 98711), ('Michal', 37, 'London', 'United Kingdom', 90000)] # Create a DataFrame object df = pd.DataFrame( students, columns =['Name', 'Age', 'City', 'Country', 'Budget'], index =['a', 'b', 'c', 'd', 'e', 'f']) # Display the DataFrame print(df) # Get column index position of column 'City' col_index = list(df.columns).index('City') print("Index position of column 'City' is ", col_index)
Output:
Name Age City Country Budget a Mark 24 Berlin Germany 89000 b Rita 20 Seoul South Korea 93000 c Vicki 21 Amsterdam Netherlands 95670 d Justin 22 Singapore Singapore 78900 e John 36 Paris France 98711 f Michal 37 London United Kingdom 90000 Index position of column 'City' is 2
It returned the column index position of column ‘City’ from the DataFrame i.e. 2.
What of column name does not exists in the DataFrame?
If the given column name does not exist in the DataFrame, then the index() function will return ValueError. For example,
# Get column index position of column 'Town' col_index = list(df.columns).index('Town') print("Index position of column 'Town' is ", col_index)
Error:
Traceback (most recent call last): File ".\temp.py", line 20, in <module> col_index = list(df.columns).index('Town') ValueError: 'Town' is not in list
As there was no column named ‘Town’ in the DataFrame, therefore it raised the ValueError. We can avoid this either by using try/except or by first checking if column with given name exist or not. For example,
colum_names = list(df.columns) if 'Town' in colum_names: # Get column index position of column 'Town' col_index = colum_names.index('Town') print("Index position of column 'Town' is ", col_index) else: print('Column does not exist in the DataFrame')
Output
Column does not exist in the DataFrame
This way we can avoid ValueError.
Summary:
We learned about two different techniques to get the column index position by the column name.