pandas.apply(): Apply a function to each row/column in Dataframe

In this article we will discuss how to apply a given lambda function or user defined function or numpy function to each row or column in a dataframe.

Python’s Pandas Library provides an member function in Dataframe class to apply a function along the axis of the Dataframe i.e. along each row or column i.e.

DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)

Important Arguments are:

  • func : Function to be applied to each column or row. This function accepts a series and returns a series.
  • axis : Axis along which the function is applied in dataframe. Default value 0.
    • If value is 0 then it applies function to each column.
    • If value is 1 then it applies function to each row.
  • args : tuple / list of arguments to passed to function.

Let’s use this to apply function to rows and columns of a Dataframe.

Suppose we have a dataframe i.e.

# List of Tuples
matrix = [(222, 34, 23),
         (333, 31, 11),
         (444, 16, 21),
         (555, 32, 22),
         (666, 33, 27),
         (777, 35, 11)
         ]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('abc'))

Contents of the dataframe in object dfObj are,

     a   b   c
0  222  34  23
1  333  31  11
2  444  16  21
3  555  32  22
4  666  33  27
5  777  35  11

Apply a lambda function to each row or each column in Dataframe

Suppose we have a lambda function that accepts a series as argument returns a new series object by adding 10 in each value of the
given series i.e.

lambda x : x + 10

Now let’s see how to apply this lambda function to each column or row of our dataframe i.e.

Apply a lambda function to each column:

To apply this lambda function to each column in dataframe, pass the lambda function as first and only argument in Dataframe.apply()
with above created dataframe object i.e.

# Apply a lambda function to each column by adding 10 to each value in each column
modDfObj = dfObj.apply(lambda x : x + 10)

print("Modified Dataframe by applying lambda function on each column:")
print(modDfObj)

Output:

Modified Dataframe by applying lambda function on each column:
     a   b   c
0  232  44  33
1  343  41  21
2  454  26  31
3  565  42  32
4  676  43  37
5  787  45  21

As there were 3 columns in dataframe, so our lambda function is called three times and for each call a column will passed as argument to
the lambda function as argument. As, our lambda function returns a copy of series by infringement the value of each element in given column by 10. This returned series replaces the column in a copy of dataframe.

So, basically Dataframe.apply() calls the passed lambda function for each column and pass the column contents as series to this lambda function. Finally it returns a modified copy of dataframe constructed with columns returned by lambda functions, instead of altering original dataframe.

Apply a lambda function to each row:

Now, to apply this lambda function to each row in dataframe, pass the lambda function as first argument and also pass axis=1 as second argument in Dataframe.apply() with above created dataframe object i.e.

# Apply a lambda function to each row by adding 5 to each value in each column
modDfObj = dfObj.apply(lambda x: x + 5, axis=1)

print("Modified Dataframe by applying lambda function on each row:")
print(modDfObj)

Output:

Modified Dataframe by applying lambda function on each row:
     a   b   c
0  227  39  28
1  338  36  16
2  449  21  26
3  560  37  27
4  671  38  32
5  782  40  16

So, basically Dataframe.apply() calls the passed lambda function for each row and passes each row contents as series to this lambda function. Finally it returns a modified copy of dataframe constructed with rows returned by lambda functions, instead of altering original dataframe.

Apply a User Defined function with or without arguments to each row or column of a Dataframe

Suppose we have a user defined function that accepts a series and returns a series by multiplying each value by 2 i.e.

# Multiply given value by 2 and returns
def doubleData(x):
   return x * 2

Now let’s see how to apply this user defined function to each column of our data frame i.e.

# Apply a user defined function to each column by doubling each value in each column
modDfObj = dfObj.apply(doubleData)

print("Modified Dataframe by applying a user defined function to each column in Dataframe :")
print(modDfObj)

Output:

Modified Dataframe by applying a user defined function to each column in Dataframe :
      a   b   c
0   444  68  46
1   666  62  22
2   888  32  42
3  1110  64  44
4  1332  66  54
5  1554  70  22

Similarly we can apply this user defined function to each row instead of column by passing an extra argument i.e.

# Apply a user defined function to each row by doubling each value in each column
modDfObj = dfObj.apply(doubleData, axis=1)

Suppose we have a user defined function that accepts other arguments too. For example, this function accepts a series and a number y then
returns a new series by multiplying each value in series by y i.e.

# Returns x*y
def multiplyData(x, y):
   return x * y

Now let’s see how to apply this user defined function with argument to each column of our data frame i.e.

# Apply a user defined function to each column that will multiply each value in each column by given number
modDfObj = dfObj.apply(multiplyData, args=[4])

print("Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :")
print(modDfObj)

Output:

Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :
      a    b    c
0   888  136   92
1  1332  124   44
2  1776   64   84
3  2220  128   88
4  2664  132  108
5  3108  140   44

Similarly we can apply this user defined function with argument to each row instead of column by passing an extra argument i.e.

# Apply a user defined function to each row by doubling each value in each column
modDfObj = dfObj.apply(multiplyData, axis=1, args=[3])

Apply a numpy functions to a to each row or column of a Dataframe

Generally in practical scenarios we apply already present numpy functions to column and rows in dataframe i.e.

Now let’s see how to apply a numpy function to each column of our data frame i.e.

# Apply a numpy function to each column by doubling each value in each column
modDfObj = dfObj.apply(np.square)

print("Modified Dataframe by applying a numpy function to each column in Dataframe :")
print(modDfObj)

Output:

Modified Dataframe by applying a numpy function to each column in Dataframe :
        a     b    c
0   49284  1156  529
1  110889   961  121
2  197136   256  441
3  308025  1024  484
4  443556  1089  729
5  603729  1225  121

Similarly we can apply a numpy function to each row instead of column by passing an extra argument i.e.

# Apply a numpy function to each row by square root each value in each column
modDfObj = dfObj.apply(np.sqrt, axis=1)

Apply a Reducing functions to a to each row or column of a Dataframe

Till now we have applying a kind of function that accepts every column or row as series and returns a series of same size. But we can also call the function that accepts a series and returns a single variable instead of series. For example let’s apply numpy.sum() to each column in dataframe to find out the sum of each values in each column i.e.

# Apply a numpy function to get the sum of values in each column
modDfObj = dfObj.apply(np.sum)

print("Modified Dataframe by applying a numpy function to get sum of values in each column :")
print(modDfObj)

Output:

Modified Dataframe by applying a numpy function to get sum of values in each column :
a    2997
b     181
c     115
dtype: int64

Now let’s apply numpy.sum() to each row in dataframe to find out the sum of each values in each row i.e.

# Apply a numpy function to get the sum of values in each row
modDfObj = dfObj.apply(np.sum, axis=1)

print("Modified Dataframe by applying a numpy function to get sum of values in each row :")
print(modDfObj)

Output:

Modified Dataframe by applying a numpy function to get sum of values in each row :
0    279
1    375
2    481
3    609
4    726
5    823
dtype: int64

Complete example is as follows:

import pandas as pd
import numpy as np


# Returns x*y
def multiplyData(x, y):
   return x * y

# Multiply given value by 2 and returns
def doubleData(x):
   return x * 2


def main():
    # List of Tuples
    matrix = [(222, 34, 23),
             (333, 31, 11),
             (444, 16, 21),
             (555, 32, 22),
             (666, 33, 27),
             (777, 35, 11)
             ]

    # Create a DataFrame object
    dfObj = pd.DataFrame(matrix, columns=list('abc'))

    print("Original Dataframe", dfObj, sep='\n')

    print('************* Apply a lambda function to each row or each column in Dataframe *************')

    print('*** Apply a lambda function to each column in Dataframe ***')

    # Apply a lambda function to each column by adding 10 to each value in each column
    modDfObj = dfObj.apply(lambda x : x + 10)

    print("Modified Dataframe by applying lambda function on each column:")
    print(modDfObj)

    print('*** Apply a lambda function to each row in Dataframe ***')

    # Apply a lambda function to each row by adding 5 to each value in each column
    modDfObj = dfObj.apply(lambda x: x + 5, axis=1)

    print("Modified Dataframe by applying lambda function on each row:")
    print(modDfObj)

    print('************* Apply a User Defined function to each row or each column in Dataframe *************')

    print('*** Apply a user defined function to each column in Dataframe ***')

    # Apply a user defined function to each column by doubling each value in each column
    modDfObj = dfObj.apply(doubleData)

    print("Modified Dataframe by applying a user defined function to each column in Dataframe :")
    print(modDfObj)

    print('*** Apply a user defined function to each row in Dataframe ***')

    # Apply a user defined function to each row by doubling each value in each column
    modDfObj = dfObj.apply(doubleData, axis=1)

    print("Modified Dataframe by applying a user defined function to each row in Dataframe :")
    print(modDfObj)

    print('************* Apply a User Defined function (with Arguments) to each row or each column in Dataframe *************')


    print('*** Apply a user defined function ( with arguments ) to each column in Dataframe ***')

    # Apply a user defined function to each column that will multiply each value in each column by given number
    modDfObj = dfObj.apply(multiplyData, args=[4])

    print("Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :")
    print(modDfObj)

    print('*** Apply a user defined function ( with arguments ) to each row in Dataframe ***')

    # Apply a user defined function to each row by doubling each value in each column
    modDfObj = dfObj.apply(multiplyData, axis=1, args=[3])

    print("Modified Dataframe by applying a user defined function (with arguments) to each row in Dataframe :")
    print(modDfObj)

    print('************* Apply a numpy function to each row or each column in Dataframe *************')

    # Apply a numpy function to each column by doubling each value in each column
    modDfObj = dfObj.apply(np.square)

    print("Modified Dataframe by applying a numpy function to each column in Dataframe :")
    print(modDfObj)

    # Apply a numpy function to each row by square root each value in each column
    modDfObj = dfObj.apply(np.sqrt, axis=1)

    print("Modified Dataframe by applying a numpy function to each row in Dataframe :")
    print(modDfObj)

    print('************* Apply a reducing function to each column or row in DataFrame *************')

    # Apply a numpy function to get the sum of values in each column
    modDfObj = dfObj.apply(np.sum)

    print("Modified Dataframe by applying a numpy function to get sum of values in each column :")
    print(modDfObj)

    # Apply a numpy function to get the sum of values in each row
    modDfObj = dfObj.apply(np.sum, axis=1)

    print("Modified Dataframe by applying a numpy function to get sum of values in each row :")
    print(modDfObj)



if __name__ == '__main__':
   main()

Output:

Original Dataframe
     a   b   c
0  222  34  23
1  333  31  11
2  444  16  21
3  555  32  22
4  666  33  27
5  777  35  11
************* Apply a lambda function to each row or each column in Dataframe *************
*** Apply a lambda function to each column in Dataframe ***
Modified Dataframe by applying lambda function on each column:
     a   b   c
0  232  44  33
1  343  41  21
2  454  26  31
3  565  42  32
4  676  43  37
5  787  45  21
*** Apply a lambda function to each row in Dataframe ***
Modified Dataframe by applying lambda function on each row:
     a   b   c
0  227  39  28
1  338  36  16
2  449  21  26
3  560  37  27
4  671  38  32
5  782  40  16
************* Apply a User Defined function to each row or each column in Dataframe *************
*** Apply a user defined function to each column in Dataframe ***
Modified Dataframe by applying a user defined function to each column in Dataframe :
      a   b   c
0   444  68  46
1   666  62  22
2   888  32  42
3  1110  64  44
4  1332  66  54
5  1554  70  22
*** Apply a user defined function to each row in Dataframe ***
Modified Dataframe by applying a user defined function to each row in Dataframe :
      a   b   c
0   444  68  46
1   666  62  22
2   888  32  42
3  1110  64  44
4  1332  66  54
5  1554  70  22
************* Apply a User Defined function (with Arguments) to each row or each column in Dataframe *************
*** Apply a user defined function ( with arguments ) to each column in Dataframe ***
Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :
      a    b    c
0   888  136   92
1  1332  124   44
2  1776   64   84
3  2220  128   88
4  2664  132  108
5  3108  140   44
*** Apply a user defined function ( with arguments ) to each row in Dataframe ***
Modified Dataframe by applying a user defined function (with arguments) to each row in Dataframe :
      a    b   c
0   666  102  69
1   999   93  33
2  1332   48  63
3  1665   96  66
4  1998   99  81
5  2331  105  33
************* Apply a numpy function to each row or each column in Dataframe *************
Modified Dataframe by applying a numpy function to each column in Dataframe :
        a     b    c
0   49284  1156  529
1  110889   961  121
2  197136   256  441
3  308025  1024  484
4  443556  1089  729
5  603729  1225  121
Modified Dataframe by applying a numpy function to each row in Dataframe :
           a         b         c
0  14.899664  5.830952  4.795832
1  18.248288  5.567764  3.316625
2  21.071308  4.000000  4.582576
3  23.558438  5.656854  4.690416
4  25.806976  5.744563  5.196152
5  27.874720  5.916080  3.316625
************* Apply a reducing function to each column or row in DataFrame *************
Modified Dataframe by applying a numpy function to get sum of values in each column :
a    2997
b     181
c     115
dtype: int64
Modified Dataframe by applying a numpy function to get sum of values in each row :
0    279
1    375
2    481
3    609
4    726
5    823
dtype: int64

1 thought on “pandas.apply(): Apply a function to each row/column in Dataframe”

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top