This article will discuss six different techniques to iterate over a dataframe row by row. Then we will also discuss how to update the contents of a Dataframe while iterating over it row by row.
Table of Contents
Suppose we have a dataframe i.e
import pandas as pd # List of Tuples empoyees = [('jack', 34, 'Sydney', 5), ('Riti', 31, 'Delhi' , 7), ('Aadi', 16, 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) print(df)
Contents of the created dataframe are,
Name Age City Experience a jack 34 Sydney 5 b Riti 31 Delhi 7 c Aadi 16 New York 11
Let’s see different ways to iterate over the rows of this dataframe,
Loop over Rows of Pandas Dataframe using iterrows()
Dataframe class provides a member function iterrows() i.e.
DataFrame.iterrows()
Frequently Asked:
- Pandas – Select Rows where column value is in List
- Drop multiple Columns from a Pandas DataFrame
- Add a header to a CSV file in Python
- How to Find & Drop duplicate columns in a DataFrame | Python Pandas
It yields an iterator which can can be used to iterate over all the rows of a dataframe in tuples. For each row it returns a tuple containing the index label and row contents as series.
Let’s iterate over all the rows of above created dataframe using iterrows() i.e.
# Loop through all rows of Dataframe along with index label for (index_label, row_series) in df.iterrows(): print('Row Index label : ', index_label) print('Row Content as Series : ', row_series.values)
Output:
Row Index label : a Row Content as Series : ['jack' 34 'Sydney' 5] Row Index label : b Row Content as Series : ['Riti' 31 'Delhi' 7] Row Index label : c Row Content as Series : ['Aadi' 16 'New York' 11]
Important points about Dataframe.iterrows()
- Do not Preserve the data types:
- As iterrows() returns each row contents as series but it does not preserve dtypes of values in the rows.
- We can not modify something while iterating over the rows using iterrows(). The iterator does not returns a view instead it returns a copy. So, making any modification in returned row contents will have no effect on actual dataframe
Loop over Rows of Pandas Dataframe using itertuples()
Dataframe class provides a member function itertuples() i.e.
DataFrame.itertuples()
For each row it yields a named tuple containing the all the column names and their value for that row. Let’s use it to iterate over all the rows of above created dataframe i.e.
# Iterate over the Dataframe rows as named tuples for namedTuple in df.itertuples(): print(namedTuple)
Output:
Pandas(Index='a', Name='jack', Age=34, City='Sydney', Experience=5) Pandas(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7) Pandas(Index='c', Name='Aadi', Age=16, City='New York', Experience=11)
For every row in the dataframe a named tuple is returned. From named tuple you can access the individual values by indexing i.e.
To access the 1st value i.e. value with tag ‘index’ use,
print(namedTuple[0] )
Output:
c
To access the 2nd value i.e. value with tag ‘Name’ use
print(namedTuple[1] )
Output:
Aadi
Named Tuples without index
If we don’t want index column to be included in these named tuple then we can pass argument index=False i.e.
# Iterate over the Dataframe rows as named tuples without index for namedTuple in df.itertuples(index=False): print(namedTuple)
Output:
Pandas(Name='jack', Age=34, City='Sydney', Experience=5) Pandas(Name='Riti', Age=31, City='Delhi', Experience=7) Pandas(Name='Aadi', Age=16, City='New York', Experience=11)
Named Tuples with custom names
By default named tuple returned is with name Pandas, we can provide our custom names too by providing name argument i.e.
# Give Custom Name to the tuple while Iterating over the Dataframe rows for row in df.itertuples(name='Employee'): print(row)
Output:
Employee(Index='a', Name='jack', Age=34, City='Sydney', Experience=5) Employee(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7) Employee(Index='c', Name='Aadi', Age=16, City='New York', Experience=11)
Pandas – Iterate over Rows as dictionary
We can also iterate over the rows of dataframe and convert them to dictionary for accessing by column label using same itertuples() i.e.
# itertuples() yields an iterate to named tuple for row in df.itertuples(name='Employee'): # Convert named tuple to dictionary dictRow = row._asdict() # Print dictionary print(dictRow) # Access elements from dict i.e. row contents print(dictRow['Name'] , ' is from ' , dictRow['City'])
Output:
OrderedDict([('Index', 'a'), ('Name', 'jack'), ('Age', 34), ('City', 'Sydney'), ('Experience', 5)]) jack is from Sydney OrderedDict([('Index', 'b'), ('Name', 'Riti'), ('Age', 31), ('City', 'Delhi'), ('Experience', 7)]) Riti is from Delhi OrderedDict([('Index', 'c'), ('Name', 'Aadi'), ('Age', 16), ('City', 'New York'), ('Experience', 11)]) Aadi is from New York
Iterate over Rows of Pandas Dataframe using index position and iloc
We can calculate the number of rows in a dataframe. Then loop through 0th index to last row and access each row by index position using iloc[] i.e.
# Loop through rows of dataframe by index i.e. # from 0 to number of rows for i in range(0, df.shape[0]): # get row contents as series using iloc{] # and index position of row rowSeries = df.iloc[i] # print row contents print(rowSeries.values)
Output:
['jack' 34 'Sydney' 5] ['Riti' 31 'Delhi' 7] ['Aadi' 16 'New York' 11]
Iterate over rows in Dataframe in reverse using index position and iloc
Get the number of rows in a dataframe. Then loop through last index to 0th index and access each row by index position using iloc[] i.e.
# Loop through rows of dataframe by index in reverse # i.e. from last row to row at 0th index. for i in range(df.shape[0] - 1, -1, -1): # get row contents as series using iloc{] & index pos of row rowSeries = df.iloc[i] # print row contents print(rowSeries.values)
Output:
['Aadi' 16 'New York' 11] ['Riti' 31 'Delhi' 7] ['jack' 34 'Sydney' 5]
Iterate over rows in dataframe using index labels and loc[]
As Dataframe.index returns a sequence of index labels, so we can iterate over those labels and access each row by index label i.e.
# loop through all the names in index # label sequence of dataframe for index in df.index: # For each index label, # access the row contents as series rowSeries = df.loc[index] # print row contents print(rowSeries.values)
Output:
['jack' 34 'Sydney' 5] ['Riti' 31 'Delhi' 7] ['Aadi' 16 'New York' 11]
Pandas : Iterate over rows and update
What if we want to change values while iterating over the rows of a Pandas Dataframe?
As Dataframe.iterrows() returns a copy of the dataframe contents in tuple, so updating it will have no effect on actual dataframe. So, to update the contents of dataframe we need to iterate over the rows of dataframe using iterrows() and then access each row using at() to update it’s contents.
Let’s see an example,
Suppose we have a dataframe i.e
# List of Tuples salaries = [(11, 5, 70000, 1000) , (12, 7, 72200, 1100) , (13, 11, 84999, 1000) ] # Create a DataFrame object df = pd.DataFrame( salaries, columns=['ID', 'Experience' , 'Salary', 'Bonus']) print(df)
Contents of the created dataframe df are,
ID Experience Salary Bonus 0 11 5 70000 1000 1 12 7 72200 1100 2 13 11 84999 1000
Let’s update each value in column ‘Bonus’ by multiplying it with 2 while iterating over the dataframe row by row i.e.
# iterate over the dataframe row by row for index_label, row_series in df.iterrows(): # For each row update the 'Bonus' value to it's double df.at[index_label , 'Bonus'] = row_series['Bonus'] * 2 print(df)
Output:
ID Experience Salary Bonus 0 11 5 70000 2000 1 12 7 72200 2200 2 13 11 84999 2000
Dataframe got updated i.e. we changed the values while iterating over the rows of Dataframe. Bonus value for each row became double.
The complete example is as follows,
import pandas as pd # List of Tuples empoyees = [('jack', 34, 'Sydney', 5), ('Riti', 31, 'Delhi' , 7), ('Aadi', 16, 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) print(df) print('**** Example 1 *********') # Loop through all rows of Dataframe along with index label for (index_label, row_series) in df.iterrows(): print('Row Index label : ', index_label) print('Row Content as Series : ', row_series.values) print('**** Example 2 *********') # Iterate over the Dataframe rows as named tuples for namedTuple in df.itertuples(): print(namedTuple) print(namedTuple[0] ) print(namedTuple[1] ) print('**** Example 3 *********') # Iterate over the Dataframe rows as named tuples without index for namedTuple in df.itertuples(index=False): print(namedTuple) print('**** Example 4 *********') # Give Custom Name to the tuple while Iterating over the Dataframe rows for row in df.itertuples(name='Employee'): print(row) print('**** Example 5 *********') # itertuples() yields an iterate to named tuple for row in df.itertuples(name='Employee'): # Convert named tuple to dictionary dictRow = row._asdict() # Print dictionary print(dictRow) # Access elements from dict i.e. row contents print(dictRow['Name'] , ' is from ' , dictRow['City']) print('**** Example 6 *********') # Loop through rows of dataframe by index i.e. # from 0 to number of rows for i in range(0, df.shape[0]): # get row contents as series using iloc{] # and index position of row rowSeries = df.iloc[i] # print row contents print(rowSeries.values) print('**** Example 7 *********') # Loop through rows of dataframe by index in reverse # i.e. from last row to row at 0th index. for i in range(df.shape[0] - 1, -1, -1): # get row contents as series using iloc{] & index pos of row rowSeries = df.iloc[i] # print row contents print(rowSeries.values) print('**** Example 8 *********') # loop through all the names in index # label sequence of dataframe for index in df.index: # For each index label, # access the row contents as series rowSeries = df.loc[index] # print row contents print(rowSeries.values) print('**** Example 9 *********') # List of Tuples salaries = [(11, 5, 70000, 1000) , (12, 7, 72200, 1100) , (13, 11, 84999, 1000) ] # Create a DataFrame object df = pd.DataFrame( salaries, columns=['ID', 'Experience' , 'Salary', 'Bonus']) print(df) # iterate over the dataframe row by row for index_label, row_series in df.iterrows(): # For each row update the 'Bonus' value to it's double df.at[index_label , 'Bonus'] = row_series['Bonus'] * 2 print(df)
Output:
Name Age City Experience a jack 34 Sydney 5 b Riti 31 Delhi 7 c Aadi 16 New York 11 **** Example 1 ********* Row Index label : a Row Content as Series : ['jack' 34 'Sydney' 5] Row Index label : b Row Content as Series : ['Riti' 31 'Delhi' 7] Row Index label : c Row Content as Series : ['Aadi' 16 'New York' 11] **** Example 2 ********* Pandas(Index='a', Name='jack', Age=34, City='Sydney', Experience=5) Pandas(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7) Pandas(Index='c', Name='Aadi', Age=16, City='New York', Experience=11) c Aadi **** Example 3 ********* Pandas(Name='jack', Age=34, City='Sydney', Experience=5) Pandas(Name='Riti', Age=31, City='Delhi', Experience=7) Pandas(Name='Aadi', Age=16, City='New York', Experience=11) **** Example 4 ********* Employee(Index='a', Name='jack', Age=34, City='Sydney', Experience=5) Employee(Index='b', Name='Riti', Age=31, City='Delhi', Experience=7) Employee(Index='c', Name='Aadi', Age=16, City='New York', Experience=11) **** Example 5 ********* OrderedDict([('Index', 'a'), ('Name', 'jack'), ('Age', 34), ('City', 'Sydney'), ('Experience', 5)]) jack is from Sydney OrderedDict([('Index', 'b'), ('Name', 'Riti'), ('Age', 31), ('City', 'Delhi'), ('Experience', 7)]) Riti is from Delhi OrderedDict([('Index', 'c'), ('Name', 'Aadi'), ('Age', 16), ('City', 'New York'), ('Experience', 11)]) Aadi is from New York **** Example 6 ********* ['jack' 34 'Sydney' 5] ['Riti' 31 'Delhi' 7] ['Aadi' 16 'New York' 11] **** Example 7 ********* ['Aadi' 16 'New York' 11] ['Riti' 31 'Delhi' 7] ['jack' 34 'Sydney' 5] **** Example 8 ********* ['jack' 34 'Sydney' 5] ['Riti' 31 'Delhi' 7] ['Aadi' 16 'New York' 11] **** Example 9 ********* ID Experience Salary Bonus 0 11 5 70000 1000 1 12 7 72200 1100 2 13 11 84999 1000 ID Experience Salary Bonus 0 11 5 70000 2000 1 12 7 72200 2200 2 13 11 84999 2000
Summary
We learned about different ways to iterate over all rows of dataframe and change values while iterating.