Pandas: Apply a function to single or selected columns or rows in Dataframe
In this article we will discuss different ways to apply a given function to selected columns or rows.
Suppose we have a dataframe object i.e.
# List of Tuples matrix = [(22, 34, 23), (33, 31, 11), (44, 16, 21), (55, 32, 22), (66, 33, 27), (77, 35, 11) ] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
Contents of the dataframe object dfObj are,
Original Dataframe x y z a 22 34 23 b 33 31 11 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11
Now if we want to call / apply a function on all the elements of a single or multiple columns or rows ? For example,
- Multiply all the values in column ‘x’ by 2
- Multiply all the values in row ‘c’ by 10
- Add 10 in all the values in column ‘y’ & ‘z’
Let’s see how to do that using different techniques,
Apply a function to a single column in Dataframe
Suppose we want to square all the values in column ‘z’ for above created dataframe object dfObj. We can do that using different methods i.e.
Method 1 : Using Dataframe.apply()
Apply a lambda function to all the columns in dataframe using Dataframe.apply() and inside this lambda function check if column name is ‘z’ then square all the values in it i.e.
# Apply function numpy.square() to square the value one column only i.e. with column name 'z' modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x) print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n')
Output:
Modified Dataframe : Squared the values in column 'z' x y z a 22 34 529 b 33 31 121 c 44 16 441 d 55 32 484 e 66 33 729 f 77 35 121
There are 2 other ways to achieve the same effect i.e.
Method 2 : Using [] Operator
Select the column from dataframe as series using [] operator and apply numpy.square() method on it. Then assign it back to column i.e.
# Apply a function to one column and assign it back to the column in dataframe dfObj['z'] = dfObj['z'].apply(np.square)
It will basically square all the values in column ‘z’
Method 3 : Using numpy.square()
# Method 3: # Apply a function to one column and assign it back to the column in dataframe dfObj['z'] = np.square(dfObj['z'])
It will also square all the values in column ‘z’
Apply a function to a single row in Dataframe
Suppose we want to square all the values in row ‘b’ for above created dataframe object dfObj. We can do that using different methods i.e.
Method 1 : Using Dataframe.apply()
Apply a lambda function to all the rows in dataframe using Dataframe.apply() and inside this lambda function check if row index label is ‘b’ then square all the values in it i.e.
# Apply function numpy.square() to square the values of one row only i.e. row with index name 'b' modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1) print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n')
Output:
Modified Dataframe : Squared the values in row 'b' x y z a 22 34 23 b 1089 961 121 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11
There are 2 other ways to achieve the same effect i.e.
Method 2 : Using [] Operator
Select the row from dataframe as series using dataframe.loc[] operator and apply numpy.square() method on it. Then assign it back to row i.e.
# Apply a function to one row and assign it back to the row in dataframe dfObj.loc['b'] = dfObj.loc['b'].apply(np.square)
It will basically square all the values in row ‘b’
Method 3 : Using numpy.square()
# Apply a function to one row and assign it back to the column in dataframe dfObj.loc['b'] = np.square(dfObj.loc['b'])
It will also square all the values in row ‘b’.
Apply a function to a certain columns in Dataframe
We can apply a given function to only specified columns too. For example square the values in column ‘x’ & ‘y’ i.e.
# Apply function numpy.square() to square the value 2 column only i.e. with column names 'x' and 'y' only modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x) print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n')
Output:
Modified Dataframe : Squared the values in column x & y : x y z a 484 1156 23 b 1089 961 11 c 1936 256 21 d 3025 1024 22 e 4356 1089 27 f 5929 1225 11
Basically we just modified the if condition in lambda function and squared the values in columns with name x & y.
Apply a function to a certain rows in Dataframe
We can apply a given function to only specified rows too. For example square the values in column ‘b’ & ‘c’ i.e.
# Apply function numpy.square() to square the values of 2 rows only i.e. with row index name 'b' and 'c' only modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1) print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')
Output:
Modified Dataframe : Squared the values in row b & c : x y z a 22 34 23 b 1089 961 121 c 1936 256 441 d 55 32 22 e 66 33 27 f 77 35 11
Basically we just modified the if condition in lambda function and squared the values in rows with name b & c.
Complete example is as follows :
import pandas as pd import numpy as np def main(): # List of Tuples matrix = [(22, 34, 23), (33, 31, 11), (44, 16, 21), (55, 32, 22), (66, 33, 27), (77, 35, 11) ] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) print("Original Dataframe", dfObj, sep='\n') print('********* Apply a function to a single row or column in DataFrame ********') print('*** Apply a function to a single column *** ') # Method 1: # Apply function numpy.square() to square the value one column only i.e. with column name 'z' modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x) print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n') # Method 2: # Apply a function to one column and assign it back to the column in dataframe dfObj['z'] = dfObj['z'].apply(np.square) # Method 3: # Apply a function to one column and assign it back to the column in dataframe dfObj['z'] = np.square(dfObj['z']) print('*** Apply a function to a single row *** ') dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) # Method 1: # Apply function numpy.square() to square the values of one row only i.e. row with index name 'b' modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1) print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n') # Method 2: # Apply a function to one row and assign it back to the row in dataframe dfObj.loc['b'] = dfObj.loc['b'].apply(np.square) # Method 3: # Apply a function to one row and assign it back to the column in dataframe dfObj.loc['b'] = np.square(dfObj.loc['b']) print('********* Apply a function to certains row or column in DataFrame ********') dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) print('Apply a function to certain columns only') # Apply function numpy.square() to square the value 2 column only i.e. with column names 'x' and 'y' only modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x) print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n') print('Apply a function to certain rows only') # Apply function numpy.square() to square the values of 2 rows only i.e. with row index name 'b' and 'c' only modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1) print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n') if __name__ == '__main__': main()
Output:
Original Dataframe x y z a 22 34 23 b 33 31 11 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11 ********* Apply a function to a single row or column in DataFrame ******** *** Apply a function to a single column *** Modified Dataframe : Squared the values in column 'z' x y z a 22 34 529 b 33 31 121 c 44 16 441 d 55 32 484 e 66 33 729 f 77 35 121 *** Apply a function to a single row *** Modified Dataframe : Squared the values in row 'b' x y z a 22 34 23 b 1089 961 121 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11 ********* Apply a function to certains row or column in DataFrame ******** Apply a function to certain columns only Modified Dataframe : Squared the values in column x & y : x y z a 484 1156 23 b 1089 961 11 c 1936 256 21 d 3025 1024 22 e 4356 1089 27 f 5929 1225 11 Apply a function to certain rows only Modified Dataframe : Squared the values in row b & c : x y z a 22 34 23 b 1089 961 121 c 1936 256 441 d 55 32 22 e 66 33 27 f 77 35 11
Your Python content is far superior than many others. May be if you can structure them in progressive order of learning it will be more beneficial for the visitors.
Good stuff and comprehensive.
Thanks
Anand