pandas.apply(): Apply a function to each row/column in Dataframe

In this article we will discuss how to apply a given lambda function or user defined function or numpy function to each row or column in a dataframe.

Python’s Pandas Library provides an member function in Dataframe class to apply a function along the axis of the Dataframe i.e. along each row or column i.e.

DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)

Important Arguments are:

  • func : Function to be applied to each column or row. This function accepts a series and returns a series.
  • axis : Axis along which the function is applied in dataframe. Default value 0.
    • If value is 0 then it applies function to each column.
    • If value is 1 then it applies function to each row.
  • args : tuple / list of arguments to passed to function.

Let’s use this to apply function to rows and columns of a Dataframe.

Suppose we have a dataframe i.e.

# List of Tuples
matrix = [(222, 34, 23),
         (333, 31, 11),
         (444, 16, 21),
         (555, 32, 22),
         (666, 33, 27),
         (777, 35, 11)
         ]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('abc'))

Contents of the dataframe in object dfObj are,
     a   b   c
0  222  34  23
1  333  31  11
2  444  16  21
3  555  32  22
4  666  33  27
5  777  35  11

Advertisements

Apply a lambda function to each row or each column in Dataframe

Suppose we have a lambda function that accepts a series as argument returns a new series object by adding 10 in each value of the
given series i.e.

lambda x : x + 10

Now let’s see how to apply this lambda function to each column or row of our dataframe i.e.

Apply a lambda function to each column:

To apply this lambda function to each column in dataframe, pass the lambda function as first and only argument in Dataframe.apply()
with above created dataframe object i.e.

# Apply a lambda function to each column by adding 10 to each value in each column
modDfObj = dfObj.apply(lambda x : x + 10)

print("Modified Dataframe by applying lambda function on each column:")
print(modDfObj)

Output:
Modified Dataframe by applying lambda function on each column:
     a   b   c
0  232  44  33
1  343  41  21
2  454  26  31
3  565  42  32
4  676  43  37
5  787  45  21

As there were 3 columns in dataframe, so our lambda function is called three times and for each call a column will passed as argument to
the lambda function as argument. As, our lambda function returns a copy of series by infringement the value of each element in given column by 10. This returned series replaces the column in a copy of dataframe.

So, basically Dataframe.apply() calls the passed lambda function for each column and pass the column contents as series to this lambda function. Finally it returns a modified copy of dataframe constructed with columns returned by lambda functions, instead of altering original dataframe.

Apply a lambda function to each row:

Now, to apply this lambda function to each row in dataframe, pass the lambda function as first argument and also pass axis=1 as second argument in Dataframe.apply() with above created dataframe object i.e.

# Apply a lambda function to each row by adding 5 to each value in each column
modDfObj = dfObj.apply(lambda x: x + 5, axis=1)

print("Modified Dataframe by applying lambda function on each row:")
print(modDfObj)

Output:
Modified Dataframe by applying lambda function on each row:
     a   b   c
0  227  39  28
1  338  36  16
2  449  21  26
3  560  37  27
4  671  38  32
5  782  40  16

So, basically Dataframe.apply() calls the passed lambda function for each row and passes each row contents as series to this lambda function. Finally it returns a modified copy of dataframe constructed with rows returned by lambda functions, instead of altering original dataframe.

Apply a User Defined function with or without arguments to each row or column of a Dataframe

Suppose we have a user defined function that accepts a series and returns a series by multiplying each value by 2 i.e.

# Multiply given value by 2 and returns
def doubleData(x):
   return x * 2

Now let’s see how to apply this user defined function to each column of our data frame i.e.
# Apply a user defined function to each column by doubling each value in each column
modDfObj = dfObj.apply(doubleData)

print("Modified Dataframe by applying a user defined function to each column in Dataframe :")
print(modDfObj)

Output:
Modified Dataframe by applying a user defined function to each column in Dataframe :
      a   b   c
0   444  68  46
1   666  62  22
2   888  32  42
3  1110  64  44
4  1332  66  54
5  1554  70  22

Similarly we can apply this user defined function to each row instead of column by passing an extra argument i.e.
# Apply a user defined function to each row by doubling each value in each column
modDfObj = dfObj.apply(doubleData, axis=1)

Suppose we have a user defined function that accepts other arguments too. For example, this function accepts a series and a number y then
returns a new series by multiplying each value in series by y i.e.
# Returns x*y
def multiplyData(x, y):
   return x * y

Now let’s see how to apply this user defined function with argument to each column of our data frame i.e.
# Apply a user defined function to each column that will multiply each value in each column by given number
modDfObj = dfObj.apply(multiplyData, args=[4])

print("Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :")
print(modDfObj)

Output:
Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :
      a    b    c
0   888  136   92
1  1332  124   44
2  1776   64   84
3  2220  128   88
4  2664  132  108
5  3108  140   44

Similarly we can apply this user defined function with argument to each row instead of column by passing an extra argument i.e.
# Apply a user defined function to each row by doubling each value in each column
modDfObj = dfObj.apply(multiplyData, axis=1, args=[3])

Apply a numpy functions to a to each row or column of a Dataframe

Generally in practical scenarios we apply already present numpy functions to column and rows in dataframe i.e.

Now let’s see how to apply a numpy function to each column of our data frame i.e.

# Apply a numpy function to each column by doubling each value in each column
modDfObj = dfObj.apply(np.square)

print("Modified Dataframe by applying a numpy function to each column in Dataframe :")
print(modDfObj)

Output:
Modified Dataframe by applying a numpy function to each column in Dataframe :
        a     b    c
0   49284  1156  529
1  110889   961  121
2  197136   256  441
3  308025  1024  484
4  443556  1089  729
5  603729  1225  121

Similarly we can apply a numpy function to each row instead of column by passing an extra argument i.e.
# Apply a numpy function to each row by square root each value in each column
modDfObj = dfObj.apply(np.sqrt, axis=1)

Apply a Reducing functions to a to each row or column of a Dataframe

Till now we have applying a kind of function that accepts every column or row as series and returns a series of same size. But we can also call the function that accepts a series and returns a single variable instead of series. For example let’s apply numpy.sum() to each column in dataframe to find out the sum of each values in each column i.e.

# Apply a numpy function to get the sum of values in each column
modDfObj = dfObj.apply(np.sum)

print("Modified Dataframe by applying a numpy function to get sum of values in each column :")
print(modDfObj)

Output:
Modified Dataframe by applying a numpy function to get sum of values in each column :
a    2997
b     181
c     115
dtype: int64

Now let’s apply numpy.sum() to each row in dataframe to find out the sum of each values in each row i.e.
# Apply a numpy function to get the sum of values in each row
modDfObj = dfObj.apply(np.sum, axis=1)

print("Modified Dataframe by applying a numpy function to get sum of values in each row :")
print(modDfObj)

Output:
Modified Dataframe by applying a numpy function to get sum of values in each row :
0    279
1    375
2    481
3    609
4    726
5    823
dtype: int64

Complete example is as follows:
import pandas as pd
import numpy as np


# Returns x*y
def multiplyData(x, y):
   return x * y

# Multiply given value by 2 and returns
def doubleData(x):
   return x * 2


def main():
    # List of Tuples
    matrix = [(222, 34, 23),
             (333, 31, 11),
             (444, 16, 21),
             (555, 32, 22),
             (666, 33, 27),
             (777, 35, 11)
             ]

    # Create a DataFrame object
    dfObj = pd.DataFrame(matrix, columns=list('abc'))

    print("Original Dataframe", dfObj, sep='\n')

    print('************* Apply a lambda function to each row or each column in Dataframe *************')

    print('*** Apply a lambda function to each column in Dataframe ***')

    # Apply a lambda function to each column by adding 10 to each value in each column
    modDfObj = dfObj.apply(lambda x : x + 10)

    print("Modified Dataframe by applying lambda function on each column:")
    print(modDfObj)

    print('*** Apply a lambda function to each row in Dataframe ***')

    # Apply a lambda function to each row by adding 5 to each value in each column
    modDfObj = dfObj.apply(lambda x: x + 5, axis=1)

    print("Modified Dataframe by applying lambda function on each row:")
    print(modDfObj)

    print('************* Apply a User Defined function to each row or each column in Dataframe *************')

    print('*** Apply a user defined function to each column in Dataframe ***')

    # Apply a user defined function to each column by doubling each value in each column
    modDfObj = dfObj.apply(doubleData)

    print("Modified Dataframe by applying a user defined function to each column in Dataframe :")
    print(modDfObj)

    print('*** Apply a user defined function to each row in Dataframe ***')

    # Apply a user defined function to each row by doubling each value in each column
    modDfObj = dfObj.apply(doubleData, axis=1)

    print("Modified Dataframe by applying a user defined function to each row in Dataframe :")
    print(modDfObj)

    print('************* Apply a User Defined function (with Arguments) to each row or each column in Dataframe *************')


    print('*** Apply a user defined function ( with arguments ) to each column in Dataframe ***')

    # Apply a user defined function to each column that will multiply each value in each column by given number
    modDfObj = dfObj.apply(multiplyData, args=[4])

    print("Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :")
    print(modDfObj)

    print('*** Apply a user defined function ( with arguments ) to each row in Dataframe ***')

    # Apply a user defined function to each row by doubling each value in each column
    modDfObj = dfObj.apply(multiplyData, axis=1, args=[3])

    print("Modified Dataframe by applying a user defined function (with arguments) to each row in Dataframe :")
    print(modDfObj)

    print('************* Apply a numpy function to each row or each column in Dataframe *************')

    # Apply a numpy function to each column by doubling each value in each column
    modDfObj = dfObj.apply(np.square)

    print("Modified Dataframe by applying a numpy function to each column in Dataframe :")
    print(modDfObj)

    # Apply a numpy function to each row by square root each value in each column
    modDfObj = dfObj.apply(np.sqrt, axis=1)

    print("Modified Dataframe by applying a numpy function to each row in Dataframe :")
    print(modDfObj)

    print('************* Apply a reducing function to each column or row in DataFrame *************')

    # Apply a numpy function to get the sum of values in each column
    modDfObj = dfObj.apply(np.sum)

    print("Modified Dataframe by applying a numpy function to get sum of values in each column :")
    print(modDfObj)

    # Apply a numpy function to get the sum of values in each row
    modDfObj = dfObj.apply(np.sum, axis=1)

    print("Modified Dataframe by applying a numpy function to get sum of values in each row :")
    print(modDfObj)



if __name__ == '__main__':
   main()


Output:
Original Dataframe
     a   b   c
0  222  34  23
1  333  31  11
2  444  16  21
3  555  32  22
4  666  33  27
5  777  35  11
************* Apply a lambda function to each row or each column in Dataframe *************
*** Apply a lambda function to each column in Dataframe ***
Modified Dataframe by applying lambda function on each column:
     a   b   c
0  232  44  33
1  343  41  21
2  454  26  31
3  565  42  32
4  676  43  37
5  787  45  21
*** Apply a lambda function to each row in Dataframe ***
Modified Dataframe by applying lambda function on each row:
     a   b   c
0  227  39  28
1  338  36  16
2  449  21  26
3  560  37  27
4  671  38  32
5  782  40  16
************* Apply a User Defined function to each row or each column in Dataframe *************
*** Apply a user defined function to each column in Dataframe ***
Modified Dataframe by applying a user defined function to each column in Dataframe :
      a   b   c
0   444  68  46
1   666  62  22
2   888  32  42
3  1110  64  44
4  1332  66  54
5  1554  70  22
*** Apply a user defined function to each row in Dataframe ***
Modified Dataframe by applying a user defined function to each row in Dataframe :
      a   b   c
0   444  68  46
1   666  62  22
2   888  32  42
3  1110  64  44
4  1332  66  54
5  1554  70  22
************* Apply a User Defined function (with Arguments) to each row or each column in Dataframe *************
*** Apply a user defined function ( with arguments ) to each column in Dataframe ***
Modified Dataframe by applying a user defined function (with arguments) to each column in Dataframe :
      a    b    c
0   888  136   92
1  1332  124   44
2  1776   64   84
3  2220  128   88
4  2664  132  108
5  3108  140   44
*** Apply a user defined function ( with arguments ) to each row in Dataframe ***
Modified Dataframe by applying a user defined function (with arguments) to each row in Dataframe :
      a    b   c
0   666  102  69
1   999   93  33
2  1332   48  63
3  1665   96  66
4  1998   99  81
5  2331  105  33
************* Apply a numpy function to each row or each column in Dataframe *************
Modified Dataframe by applying a numpy function to each column in Dataframe :
        a     b    c
0   49284  1156  529
1  110889   961  121
2  197136   256  441
3  308025  1024  484
4  443556  1089  729
5  603729  1225  121
Modified Dataframe by applying a numpy function to each row in Dataframe :
           a         b         c
0  14.899664  5.830952  4.795832
1  18.248288  5.567764  3.316625
2  21.071308  4.000000  4.582576
3  23.558438  5.656854  4.690416
4  25.806976  5.744563  5.196152
5  27.874720  5.916080  3.316625
************* Apply a reducing function to each column or row in DataFrame *************
Modified Dataframe by applying a numpy function to get sum of values in each column :
a    2997
b     181
c     115
dtype: int64
Modified Dataframe by applying a numpy function to get sum of values in each row :
0    279
1    375
2    481
3    609
4    726
5    823
dtype: int64

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

1 thought on “pandas.apply(): Apply a function to each row/column in Dataframe”

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top