In this article we will discuss how to apply a given lambda function or user defined function or numpy function to each row or column in a dataframe.
Python’s Pandas Library provides an member function in Dataframe class to apply a function along the axis of the Dataframe i.e. along each row or column i.e.
DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)
Important Arguments are:
- func : Function to be applied to each column or row. This function accepts a series and returns a series.
- axis : Axis along which the function is applied in dataframe. Default value 0.
- If value is 0 then it applies function to each column.
- If value is 1 then it applies function to each row.
- args : tuple / list of arguments to passed to function.
Let’s use this to apply function to rows and columns of a Dataframe.
Suppose we have a dataframe i.e.
# List of Tuples matrix = [(222, 34, 23), (333, 31, 11), (444, 16, 21), (555, 32, 22), (666, 33, 27), (777, 35, 11) ] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('abc'))
Contents of the dataframe in object dfObj are,
a b c 0 222 34 23 1 333 31 11 2 444 16 21 3 555 32 22 4 666 33 27 5 777 35 11
Apply a lambda function to each row or each column in Dataframe
Suppose we have a lambda function that accepts a series as argument returns a new series object by adding 10 in each value of the
given series i.e.
lambda x : x + 10
Now let’s see how to apply this lambda function to each column or row of our dataframe i.e.
Apply a lambda function to each column:
To apply this lambda function to each column in dataframe, pass the lambda function as first and only argument in Dataframe.apply()
with above created dataframe object i.e.
# Apply a lambda function to each column by adding 10 to each value in each column modDfObj = dfObj.apply(lambda x : x + 10) print("Modified Dataframe by applying lambda function on each column:") print(modDfObj)
Output:
Modified Dataframe by applying lambda function on each column: a b c 0 232 44 33 1 343 41 21 2 454 26 31 3 565 42 32 4 676 43 37 5 787 45 21
As there were 3 columns in dataframe, so our lambda function is called three times and for each call a column will passed as argument to
the lambda function as argument. As, our lambda function returns a copy of series by infringement the value of each element in given column by 10. This returned series replaces the column in a copy of dataframe.
So, basically Dataframe.apply() calls the passed lambda function for each column and pass the column contents as series to this lambda function. Finally it returns a modified copy of dataframe constructed with columns returned by lambda functions, instead of altering original dataframe.
Apply a lambda function to each row:
Now, to apply this lambda function to each row in dataframe, pass the lambda function as first argument and also pass axis=1 as second argument in Dataframe.apply() with above created dataframe object i.e.
# Apply a lambda function to each row by adding 5 to each value in each column modDfObj = dfObj.apply(lambda x: x + 5, axis=1) print("Modified Dataframe by applying lambda function on each row:") print(modDfObj)
Output:
Modified Dataframe by applying lambda function on each row: a b c 0 227 39 28 1 338 36 16 2 449 21 26 3 560 37 27 4 671 38 32 5 782 40 16
So, basically Dataframe.apply() calls the passed lambda function for each row and passes each row contents as series to this lambda function. Finally it returns a modified copy of dataframe constructed with rows returned by lambda functions, instead of altering original dataframe.
Apply a User Defined function with or without arguments to each row or column of a Dataframe
Suppose we have a user defined function that accepts a series and returns a series by multiplying each value by 2 i.e.
# Multiply given value by 2 and returns def doubleData(x): return x * 2
Now let’s see how to apply this user defined function to each column of our data frame i.e.
# Apply a user defined function to each column by doubling each value in each column modDfObj = dfObj.apply(doubleData) print("Modified Dataframe by applying a user defined function to each column in Dataframe :") print(modDfObj)
Output:
Modified Dataframe by applying a user defined function to each column in Dataframe : a b c 0 444 68 46 1 666 62 22 2 888 32 42 3 1110 64 44 4 1332 66 54 5 1554 70 22
Similarly we can apply this user defined function to each row instead of column by passing an extra argument i.e.
# Apply a user defined function to each row by doubling each value in each column modDfObj = dfObj.apply(doubleData, axis=1)
Suppose we have a user defined function that accepts other arguments too. For example, this function accepts a series and a number y then
returns a new series by multiplying each value in series by y i.e.
# Returns x*y def multiplyData(x, y): return x * y
Now let’s see how to apply this user defined function with argument to each column of our data frame i.e.
# Apply a user defined function to each column that will multiply each value in each column by given number modDfObj = dfObj.apply(multiplyData, args=[4]) print("Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :") print(modDfObj)
Output:
Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe : a b c 0 888 136 92 1 1332 124 44 2 1776 64 84 3 2220 128 88 4 2664 132 108 5 3108 140 44
Similarly we can apply this user defined function with argument to each row instead of column by passing an extra argument i.e.
# Apply a user defined function to each row by doubling each value in each column modDfObj = dfObj.apply(multiplyData, axis=1, args=[3])
Apply a numpy functions to a to each row or column of a Dataframe
Generally in practical scenarios we apply already present numpy functions to column and rows in dataframe i.e.
Now let’s see how to apply a numpy function to each column of our data frame i.e.
# Apply a numpy function to each column by doubling each value in each column modDfObj = dfObj.apply(np.square) print("Modified Dataframe by applying a numpy function to each column in Dataframe :") print(modDfObj)
Output:
Modified Dataframe by applying a numpy function to each column in Dataframe : a b c 0 49284 1156 529 1 110889 961 121 2 197136 256 441 3 308025 1024 484 4 443556 1089 729 5 603729 1225 121
Similarly we can apply a numpy function to each row instead of column by passing an extra argument i.e.
# Apply a numpy function to each row by square root each value in each column modDfObj = dfObj.apply(np.sqrt, axis=1)
Apply a Reducing functions to a to each row or column of a Dataframe
Till now we have applying a kind of function that accepts every column or row as series and returns a series of same size. But we can also call the function that accepts a series and returns a single variable instead of series. For example let’s apply numpy.sum() to each column in dataframe to find out the sum of each values in each column i.e.
# Apply a numpy function to get the sum of values in each column modDfObj = dfObj.apply(np.sum) print("Modified Dataframe by applying a numpy function to get sum of values in each column :") print(modDfObj)
Output:
Modified Dataframe by applying a numpy function to get sum of values in each column : a 2997 b 181 c 115 dtype: int64
Now let’s apply numpy.sum() to each row in dataframe to find out the sum of each values in each row i.e.
# Apply a numpy function to get the sum of values in each row modDfObj = dfObj.apply(np.sum, axis=1) print("Modified Dataframe by applying a numpy function to get sum of values in each row :") print(modDfObj)
Output:
Modified Dataframe by applying a numpy function to get sum of values in each row : 0 279 1 375 2 481 3 609 4 726 5 823 dtype: int64
Complete example is as follows:
import pandas as pd import numpy as np # Returns x*y def multiplyData(x, y): return x * y # Multiply given value by 2 and returns def doubleData(x): return x * 2 def main(): # List of Tuples matrix = [(222, 34, 23), (333, 31, 11), (444, 16, 21), (555, 32, 22), (666, 33, 27), (777, 35, 11) ] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('abc')) print("Original Dataframe", dfObj, sep='\n') print('************* Apply a lambda function to each row or each column in Dataframe *************') print('*** Apply a lambda function to each column in Dataframe ***') # Apply a lambda function to each column by adding 10 to each value in each column modDfObj = dfObj.apply(lambda x : x + 10) print("Modified Dataframe by applying lambda function on each column:") print(modDfObj) print('*** Apply a lambda function to each row in Dataframe ***') # Apply a lambda function to each row by adding 5 to each value in each column modDfObj = dfObj.apply(lambda x: x + 5, axis=1) print("Modified Dataframe by applying lambda function on each row:") print(modDfObj) print('************* Apply a User Defined function to each row or each column in Dataframe *************') print('*** Apply a user defined function to each column in Dataframe ***') # Apply a user defined function to each column by doubling each value in each column modDfObj = dfObj.apply(doubleData) print("Modified Dataframe by applying a user defined function to each column in Dataframe :") print(modDfObj) print('*** Apply a user defined function to each row in Dataframe ***') # Apply a user defined function to each row by doubling each value in each column modDfObj = dfObj.apply(doubleData, axis=1) print("Modified Dataframe by applying a user defined function to each row in Dataframe :") print(modDfObj) print('************* Apply a User Defined function (with Arguments) to each row or each column in Dataframe *************') print('*** Apply a user defined function ( with arguments ) to each column in Dataframe ***') # Apply a user defined function to each column that will multiply each value in each column by given number modDfObj = dfObj.apply(multiplyData, args=[4]) print("Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :") print(modDfObj) print('*** Apply a user defined function ( with arguments ) to each row in Dataframe ***') # Apply a user defined function to each row by doubling each value in each column modDfObj = dfObj.apply(multiplyData, axis=1, args=[3]) print("Modified Dataframe by applying a user defined function (with arguments) to each row in Dataframe :") print(modDfObj) print('************* Apply a numpy function to each row or each column in Dataframe *************') # Apply a numpy function to each column by doubling each value in each column modDfObj = dfObj.apply(np.square) print("Modified Dataframe by applying a numpy function to each column in Dataframe :") print(modDfObj) # Apply a numpy function to each row by square root each value in each column modDfObj = dfObj.apply(np.sqrt, axis=1) print("Modified Dataframe by applying a numpy function to each row in Dataframe :") print(modDfObj) print('************* Apply a reducing function to each column or row in DataFrame *************') # Apply a numpy function to get the sum of values in each column modDfObj = dfObj.apply(np.sum) print("Modified Dataframe by applying a numpy function to get sum of values in each column :") print(modDfObj) # Apply a numpy function to get the sum of values in each row modDfObj = dfObj.apply(np.sum, axis=1) print("Modified Dataframe by applying a numpy function to get sum of values in each row :") print(modDfObj) if __name__ == '__main__': main()
Output:
Original Dataframe a b c 0 222 34 23 1 333 31 11 2 444 16 21 3 555 32 22 4 666 33 27 5 777 35 11 ************* Apply a lambda function to each row or each column in Dataframe ************* *** Apply a lambda function to each column in Dataframe *** Modified Dataframe by applying lambda function on each column: a b c 0 232 44 33 1 343 41 21 2 454 26 31 3 565 42 32 4 676 43 37 5 787 45 21 *** Apply a lambda function to each row in Dataframe *** Modified Dataframe by applying lambda function on each row: a b c 0 227 39 28 1 338 36 16 2 449 21 26 3 560 37 27 4 671 38 32 5 782 40 16 ************* Apply a User Defined function to each row or each column in Dataframe ************* *** Apply a user defined function to each column in Dataframe *** Modified Dataframe by applying a user defined function to each column in Dataframe : a b c 0 444 68 46 1 666 62 22 2 888 32 42 3 1110 64 44 4 1332 66 54 5 1554 70 22 *** Apply a user defined function to each row in Dataframe *** Modified Dataframe by applying a user defined function to each row in Dataframe : a b c 0 444 68 46 1 666 62 22 2 888 32 42 3 1110 64 44 4 1332 66 54 5 1554 70 22 ************* Apply a User Defined function (with Arguments) to each row or each column in Dataframe ************* *** Apply a user defined function ( with arguments ) to each column in Dataframe *** Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe : a b c 0 888 136 92 1 1332 124 44 2 1776 64 84 3 2220 128 88 4 2664 132 108 5 3108 140 44 *** Apply a user defined function ( with arguments ) to each row in Dataframe *** Modified Dataframe by applying a user defined function (with arguments) to each row in Dataframe : a b c 0 666 102 69 1 999 93 33 2 1332 48 63 3 1665 96 66 4 1998 99 81 5 2331 105 33 ************* Apply a numpy function to each row or each column in Dataframe ************* Modified Dataframe by applying a numpy function to each column in Dataframe : a b c 0 49284 1156 529 1 110889 961 121 2 197136 256 441 3 308025 1024 484 4 443556 1089 729 5 603729 1225 121 Modified Dataframe by applying a numpy function to each row in Dataframe : a b c 0 14.899664 5.830952 4.795832 1 18.248288 5.567764 3.316625 2 21.071308 4.000000 4.582576 3 23.558438 5.656854 4.690416 4 25.806976 5.744563 5.196152 5 27.874720 5.916080 3.316625 ************* Apply a reducing function to each column or row in DataFrame ************* Modified Dataframe by applying a numpy function to get sum of values in each column : a 2997 b 181 c 115 dtype: int64 Modified Dataframe by applying a numpy function to get sum of values in each row : 0 279 1 375 2 481 3 609 4 726 5 823 dtype: int64
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.
Excellent post: it was very helpful to me!