In this article we will discuss how to apply a given lambda function or user defined function or numpy function to each row or column in a dataframe.
Python’s Pandas Library provides an member function in Dataframe class to apply a function along the axis of the Dataframe i.e. along each row or column i.e.
DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)
Important Arguments are:
- func : Function to be applied to each column or row. This function accepts a series and returns a series.
- axis : Axis along which the function is applied in dataframe. Default value 0.
- If value is 0 then it applies function to each column.
- If value is 1 then it applies function to each row.
- args : tuple / list of arguments to passed to function.
Let’s use this to apply function to rows and columns of a Dataframe.
Suppose we have a dataframe i.e.
# List of Tuples matrix = [(222, 34, 23), (333, 31, 11), (444, 16, 21), (555, 32, 22), (666, 33, 27), (777, 35, 11) ] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('abc'))
Contents of the dataframe in object dfObj are,
Frequently Asked:
a b c 0 222 34 23 1 333 31 11 2 444 16 21 3 555 32 22 4 666 33 27 5 777 35 11
Apply a lambda function to each row or each column in Dataframe
Suppose we have a lambda function that accepts a series as argument returns a new series object by adding 10 in each value of the
given series i.e.
lambda x : x + 10
Now let’s see how to apply this lambda function to each column or row of our dataframe i.e.
Apply a lambda function to each column:
To apply this lambda function to each column in dataframe, pass the lambda function as first and only argument in Dataframe.apply()
with above created dataframe object i.e.
# Apply a lambda function to each column by adding 10 to each value in each column modDfObj = dfObj.apply(lambda x : x + 10) print("Modified Dataframe by applying lambda function on each column:") print(modDfObj)
Output:
Modified Dataframe by applying lambda function on each column: a b c 0 232 44 33 1 343 41 21 2 454 26 31 3 565 42 32 4 676 43 37 5 787 45 21
As there were 3 columns in dataframe, so our lambda function is called three times and for each call a column will passed as argument to
the lambda function as argument. As, our lambda function returns a copy of series by infringement the value of each element in given column by 10. This returned series replaces the column in a copy of dataframe.
So, basically Dataframe.apply() calls the passed lambda function for each column and pass the column contents as series to this lambda function. Finally it returns a modified copy of dataframe constructed with columns returned by lambda functions, instead of altering original dataframe.
Apply a lambda function to each row:
Now, to apply this lambda function to each row in dataframe, pass the lambda function as first argument and also pass axis=1 as second argument in Dataframe.apply() with above created dataframe object i.e.
# Apply a lambda function to each row by adding 5 to each value in each column modDfObj = dfObj.apply(lambda x: x + 5, axis=1) print("Modified Dataframe by applying lambda function on each row:") print(modDfObj)
Output:
Modified Dataframe by applying lambda function on each row: a b c 0 227 39 28 1 338 36 16 2 449 21 26 3 560 37 27 4 671 38 32 5 782 40 16
So, basically Dataframe.apply() calls the passed lambda function for each row and passes each row contents as series to this lambda function. Finally it returns a modified copy of dataframe constructed with rows returned by lambda functions, instead of altering original dataframe.
Apply a User Defined function with or without arguments to each row or column of a Dataframe
Suppose we have a user defined function that accepts a series and returns a series by multiplying each value by 2 i.e.
# Multiply given value by 2 and returns def doubleData(x): return x * 2
Now let’s see how to apply this user defined function to each column of our data frame i.e.
# Apply a user defined function to each column by doubling each value in each column modDfObj = dfObj.apply(doubleData) print("Modified Dataframe by applying a user defined function to each column in Dataframe :") print(modDfObj)
Output:
Modified Dataframe by applying a user defined function to each column in Dataframe : a b c 0 444 68 46 1 666 62 22 2 888 32 42 3 1110 64 44 4 1332 66 54 5 1554 70 22
Similarly we can apply this user defined function to each row instead of column by passing an extra argument i.e.
# Apply a user defined function to each row by doubling each value in each column modDfObj = dfObj.apply(doubleData, axis=1)
Suppose we have a user defined function that accepts other arguments too. For example, this function accepts a series and a number y then
returns a new series by multiplying each value in series by y i.e.
# Returns x*y def multiplyData(x, y): return x * y
Now let’s see how to apply this user defined function with argument to each column of our data frame i.e.
# Apply a user defined function to each column that will multiply each value in each column by given number modDfObj = dfObj.apply(multiplyData, args=[4]) print("Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :") print(modDfObj)
Output:
Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe : a b c 0 888 136 92 1 1332 124 44 2 1776 64 84 3 2220 128 88 4 2664 132 108 5 3108 140 44
Similarly we can apply this user defined function with argument to each row instead of column by passing an extra argument i.e.
# Apply a user defined function to each row by doubling each value in each column modDfObj = dfObj.apply(multiplyData, axis=1, args=[3])
Apply a numpy functions to a to each row or column of a Dataframe
Generally in practical scenarios we apply already present numpy functions to column and rows in dataframe i.e.
Now let’s see how to apply a numpy function to each column of our data frame i.e.
# Apply a numpy function to each column by doubling each value in each column modDfObj = dfObj.apply(np.square) print("Modified Dataframe by applying a numpy function to each column in Dataframe :") print(modDfObj)
Output:
Modified Dataframe by applying a numpy function to each column in Dataframe : a b c 0 49284 1156 529 1 110889 961 121 2 197136 256 441 3 308025 1024 484 4 443556 1089 729 5 603729 1225 121
Similarly we can apply a numpy function to each row instead of column by passing an extra argument i.e.
# Apply a numpy function to each row by square root each value in each column modDfObj = dfObj.apply(np.sqrt, axis=1)
Apply a Reducing functions to a to each row or column of a Dataframe
Till now we have applying a kind of function that accepts every column or row as series and returns a series of same size. But we can also call the function that accepts a series and returns a single variable instead of series. For example let’s apply numpy.sum() to each column in dataframe to find out the sum of each values in each column i.e.
# Apply a numpy function to get the sum of values in each column modDfObj = dfObj.apply(np.sum) print("Modified Dataframe by applying a numpy function to get sum of values in each column :") print(modDfObj)
Output:
Modified Dataframe by applying a numpy function to get sum of values in each column : a 2997 b 181 c 115 dtype: int64
Now let’s apply numpy.sum() to each row in dataframe to find out the sum of each values in each row i.e.
# Apply a numpy function to get the sum of values in each row modDfObj = dfObj.apply(np.sum, axis=1) print("Modified Dataframe by applying a numpy function to get sum of values in each row :") print(modDfObj)
Output:
Modified Dataframe by applying a numpy function to get sum of values in each row : 0 279 1 375 2 481 3 609 4 726 5 823 dtype: int64
Complete example is as follows:
import pandas as pd import numpy as np # Returns x*y def multiplyData(x, y): return x * y # Multiply given value by 2 and returns def doubleData(x): return x * 2 def main(): # List of Tuples matrix = [(222, 34, 23), (333, 31, 11), (444, 16, 21), (555, 32, 22), (666, 33, 27), (777, 35, 11) ] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('abc')) print("Original Dataframe", dfObj, sep='\n') print('************* Apply a lambda function to each row or each column in Dataframe *************') print('*** Apply a lambda function to each column in Dataframe ***') # Apply a lambda function to each column by adding 10 to each value in each column modDfObj = dfObj.apply(lambda x : x + 10) print("Modified Dataframe by applying lambda function on each column:") print(modDfObj) print('*** Apply a lambda function to each row in Dataframe ***') # Apply a lambda function to each row by adding 5 to each value in each column modDfObj = dfObj.apply(lambda x: x + 5, axis=1) print("Modified Dataframe by applying lambda function on each row:") print(modDfObj) print('************* Apply a User Defined function to each row or each column in Dataframe *************') print('*** Apply a user defined function to each column in Dataframe ***') # Apply a user defined function to each column by doubling each value in each column modDfObj = dfObj.apply(doubleData) print("Modified Dataframe by applying a user defined function to each column in Dataframe :") print(modDfObj) print('*** Apply a user defined function to each row in Dataframe ***') # Apply a user defined function to each row by doubling each value in each column modDfObj = dfObj.apply(doubleData, axis=1) print("Modified Dataframe by applying a user defined function to each row in Dataframe :") print(modDfObj) print('************* Apply a User Defined function (with Arguments) to each row or each column in Dataframe *************') print('*** Apply a user defined function ( with arguments ) to each column in Dataframe ***') # Apply a user defined function to each column that will multiply each value in each column by given number modDfObj = dfObj.apply(multiplyData, args=[4]) print("Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :") print(modDfObj) print('*** Apply a user defined function ( with arguments ) to each row in Dataframe ***') # Apply a user defined function to each row by doubling each value in each column modDfObj = dfObj.apply(multiplyData, axis=1, args=[3]) print("Modified Dataframe by applying a user defined function (with arguments) to each row in Dataframe :") print(modDfObj) print('************* Apply a numpy function to each row or each column in Dataframe *************') # Apply a numpy function to each column by doubling each value in each column modDfObj = dfObj.apply(np.square) print("Modified Dataframe by applying a numpy function to each column in Dataframe :") print(modDfObj) # Apply a numpy function to each row by square root each value in each column modDfObj = dfObj.apply(np.sqrt, axis=1) print("Modified Dataframe by applying a numpy function to each row in Dataframe :") print(modDfObj) print('************* Apply a reducing function to each column or row in DataFrame *************') # Apply a numpy function to get the sum of values in each column modDfObj = dfObj.apply(np.sum) print("Modified Dataframe by applying a numpy function to get sum of values in each column :") print(modDfObj) # Apply a numpy function to get the sum of values in each row modDfObj = dfObj.apply(np.sum, axis=1) print("Modified Dataframe by applying a numpy function to get sum of values in each row :") print(modDfObj) if __name__ == '__main__': main()
Output:
Original Dataframe a b c 0 222 34 23 1 333 31 11 2 444 16 21 3 555 32 22 4 666 33 27 5 777 35 11 ************* Apply a lambda function to each row or each column in Dataframe ************* *** Apply a lambda function to each column in Dataframe *** Modified Dataframe by applying lambda function on each column: a b c 0 232 44 33 1 343 41 21 2 454 26 31 3 565 42 32 4 676 43 37 5 787 45 21 *** Apply a lambda function to each row in Dataframe *** Modified Dataframe by applying lambda function on each row: a b c 0 227 39 28 1 338 36 16 2 449 21 26 3 560 37 27 4 671 38 32 5 782 40 16 ************* Apply a User Defined function to each row or each column in Dataframe ************* *** Apply a user defined function to each column in Dataframe *** Modified Dataframe by applying a user defined function to each column in Dataframe : a b c 0 444 68 46 1 666 62 22 2 888 32 42 3 1110 64 44 4 1332 66 54 5 1554 70 22 *** Apply a user defined function to each row in Dataframe *** Modified Dataframe by applying a user defined function to each row in Dataframe : a b c 0 444 68 46 1 666 62 22 2 888 32 42 3 1110 64 44 4 1332 66 54 5 1554 70 22 ************* Apply a User Defined function (with Arguments) to each row or each column in Dataframe ************* *** Apply a user defined function ( with arguments ) to each column in Dataframe *** Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe : a b c 0 888 136 92 1 1332 124 44 2 1776 64 84 3 2220 128 88 4 2664 132 108 5 3108 140 44 *** Apply a user defined function ( with arguments ) to each row in Dataframe *** Modified Dataframe by applying a user defined function (with arguments) to each row in Dataframe : a b c 0 666 102 69 1 999 93 33 2 1332 48 63 3 1665 96 66 4 1998 99 81 5 2331 105 33 ************* Apply a numpy function to each row or each column in Dataframe ************* Modified Dataframe by applying a numpy function to each column in Dataframe : a b c 0 49284 1156 529 1 110889 961 121 2 197136 256 441 3 308025 1024 484 4 443556 1089 729 5 603729 1225 121 Modified Dataframe by applying a numpy function to each row in Dataframe : a b c 0 14.899664 5.830952 4.795832 1 18.248288 5.567764 3.316625 2 21.071308 4.000000 4.582576 3 23.558438 5.656854 4.690416 4 25.806976 5.744563 5.196152 5 27.874720 5.916080 3.316625 ************* Apply a reducing function to each column or row in DataFrame ************* Modified Dataframe by applying a numpy function to get sum of values in each column : a 2997 b 181 c 115 dtype: int64 Modified Dataframe by applying a numpy function to get sum of values in each row : 0 279 1 375 2 481 3 609 4 726 5 823 dtype: int64
Excellent post: it was very helpful to me!