In this article we will different ways to iterate over all or certain columns of a Dataframe.
Let’s first create a Dataframe i.e.
# List of Tuples empoyees = [('jack', 34, 'Sydney') , ('Riti', 31, 'Delhi') , ('Aadi', 16, 'New York') , ('Mohit', 32,'Delhi') , ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City'], index=['a', 'b', 'c', 'd'])
Contents of created dataframe empDfObj are,
Name Age City a jack 34 Sydney b Riti 31 Delhi c Aadi 16 New York d Mohit 32 Delhi
Iterate over columns of a DataFrame using DataFrame.iteritems()
Dataframe class provides a member function iteritems() i.e.
DataFrame.iteritems()
It yields an iterator which can can be used to iterate over all the columns of a dataframe. For each column in the Dataframe it returns an iterator to the tuple containing the column name and column contents as series.
Let’s user iteritems() to iterate over the columns of above created Dataframe,
Frequently Asked:
- Convert JSON to a Pandas Dataframe
- Pandas : Read csv file to Dataframe with custom delimiter in Python
- Replace column values based on conditions in Pandas
- Get First value of a Column in Pandas DataFrame
# Yields a tuple of column name and series for each column in the dataframe for (columnName, columnData) in empDfObj.iteritems(): print('Colunm Name : ', columnName) print('Column Contents : ', columnData.values)
Output:
Colunm Name : Name Column Contents : ['jack' 'Riti' 'Aadi' 'Mohit'] Colunm Name : Age Column Contents : [34 31 16 32] Colunm Name : City Column Contents : ['Sydney' 'Delhi' 'New York' 'Delhi']
As there were 3 columns so 3 tuples were returned during iteration.
Iterate over columns in dataframe using Column Names
Dataframe.columns returns a sequence of column names. We can iterate over these column names and for each column name we can select the column contents by column name i.e.
# Iterate over the sequence of column names for column in empDfObj: # Select column contents by column name using [] operator columnSeriesObj = empDfObj[column] print('Colunm Name : ', column) print('Column Contents : ', columnSeriesObj.values)
Output:
Colunm Name : Name Column Contents : ['jack' 'Riti' 'Aadi' 'Mohit'] Colunm Name : Age Column Contents : [34 31 16 32] Colunm Name : City Column Contents : ['Sydney' 'Delhi' 'New York' 'Delhi']
Iterate over certain columns in dataframe
Suppose we want to iterate over two columns i.e. Name & Age in the above created dataframe. To do the we can select those columns only from dataframe and then iterate over them i.e.
# Iterate over two given columns only from the dataframe for column in empDfObj[['Name', 'City']]: # Select column contents by column name using [] operator columnSeriesObj = empDfObj[column] print('Colunm Name : ', column) print('Column Contents : ', columnSeriesObj.values)
Output:
Colunm Name : Name Column Contents : ['jack' 'Riti' 'Aadi' 'Mohit'] Colunm Name : City Column Contents : ['Sydney' 'Delhi' 'New York' 'Delhi']
Iterate Over columns in dataframe in reverse order
As Dataframe.columns returns a sequence of column names. We can reverse iterate over these column names and for each column name we can select the column contents by column name i.e.
# Iterate over the sequence of column names in reverse order for column in reversed(empDfObj.columns): # Select column contents by column name using [] operator columnSeriesObj = empDfObj[column] print('Colunm Name : ', column) print('Column Contents : ', columnSeriesObj.values)
Output:
Colunm Name : City Column Contents : ['Sydney' 'Delhi' 'New York' 'Delhi'] Colunm Name : Age Column Contents : [34 31 16 32] Colunm Name : Name Column Contents : ['jack' 'Riti' 'Aadi' 'Mohit']
It basically printed the all the columns of Dataframe in reverse order.
Iterate Over columns in dataframe by index using iloc[]
To iterate over the columns of a Dataframe by index we can iterate over a range i.e. 0 to Max number of columns then for each index we can select the columns contents using iloc[]. Let’s see how to iterate over all columns of dataframe from 0th index to last index i.e.
# Iterate over the index range from o to max number of columns in dataframe for index in range(empDfObj.shape[1]): print('Column Number : ', index) # Select column by index position using iloc[] columnSeriesObj = empDfObj.iloc[: , index] print('Column Contents : ', columnSeriesObj.values)
Output:
Column Number : 0 Column Contents : ['jack' 'Riti' 'Aadi' 'Mohit'] Column Number : 1 Column Contents : [34 31 16 32] Column Number : 2 Column Contents : ['Sydney' 'Delhi' 'New York' 'Delhi']
Complete example is as follows,
import pandas as pd def main(): # List of Tuples empoyees = [('jack', 34, 'Sydney') , ('Riti', 31, 'Delhi') , ('Aadi', 16, 'New York') , ('Mohit', 32,'Delhi') , ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City'], index=['a', 'b', 'c', 'd']) print("Contents of the Dataframe : ") print(empDfObj) print('**** Iterate Over columns in Dataframe using Dataframe.iteritems() ') # Yields a tuple of column name and series for each column in the dataframe for (columnName, columnData) in empDfObj.iteritems(): print('Colunm Name : ', columnName) print('Column Contents : ', columnData.values) print('*** Iterate over columns in dataframe using Column Names ***"') # Iterate over the sequence of column names for column in empDfObj: # Select column contents by column name using [] operator columnSeriesObj = empDfObj[column] print('Colunm Name : ', column) print('Column Contents : ', columnSeriesObj.values) print('*** Iterate over certain columns in dataframe ***"') # Iterate over two given columns only from the dataframe for column in empDfObj[['Name', 'City']]: # Select column contents by column name using [] operator columnSeriesObj = empDfObj[column] print('Colunm Name : ', column) print('Column Contents : ', columnSeriesObj.values) print('**** Iterate Over columns in dataframe in reverse order ****') # Iterate over the sequence of column names in reverse order for column in reversed(empDfObj.columns): # Select column contents by column name using [] operator columnSeriesObj = empDfObj[column] print('Colunm Name : ', column) print('Column Contents : ', columnSeriesObj.values) print('**** Iterate Over columns in dataframe by index using iloc[] ****') # Iterate over the index range from o to max number of columns in dataframe for index in range(empDfObj.shape[1]): print('Column Number : ', index) # Select column by index position using iloc[] columnSeriesObj = empDfObj.iloc[: , index] print('Column Contents : ', columnSeriesObj.values) if __name__ == '__main__': main()
Output:
Contents of the Dataframe : Name Age City a jack 34 Sydney b Riti 31 Delhi c Aadi 16 New York d Mohit 32 Delhi **** Iterate Over columns in Dataframe using Dataframe.iteritems() Colunm Name : Name Column Contents : ['jack' 'Riti' 'Aadi' 'Mohit'] Colunm Name : Age Column Contents : [34 31 16 32] Colunm Name : City Column Contents : ['Sydney' 'Delhi' 'New York' 'Delhi'] *** Iterate over columns in dataframe using Column Names ***" Colunm Name : Name Column Contents : ['jack' 'Riti' 'Aadi' 'Mohit'] Colunm Name : Age Column Contents : [34 31 16 32] Colunm Name : City Column Contents : ['Sydney' 'Delhi' 'New York' 'Delhi'] *** Iterate over certain columns in dataframe ***" Colunm Name : Name Column Contents : ['jack' 'Riti' 'Aadi' 'Mohit'] Colunm Name : City Column Contents : ['Sydney' 'Delhi' 'New York' 'Delhi'] **** Iterate Over columns in dataframe in reverse order **** Colunm Name : City Column Contents : ['Sydney' 'Delhi' 'New York' 'Delhi'] Colunm Name : Age Column Contents : [34 31 16 32] Colunm Name : Name Column Contents : ['jack' 'Riti' 'Aadi' 'Mohit'] **** Iterate Over columns in dataframe by index using iloc[] **** Column Number : 0 Column Contents : ['jack' 'Riti' 'Aadi' 'Mohit'] Column Number : 1 Column Contents : [34 31 16 32] Column Number : 2 Column Contents : ['Sydney' 'Delhi' 'New York' 'Delhi']