In this article, we will discuss different ways to apply a given function to selected columns or rows of a Pandas DataFrame.
Table Of Contents
- Apply a function to a single column in Dataframe.
- Apply a function to a single row in Dataframe.
- Apply a function to a certain columns in Dataframe.
- Apply a function to a certain rows in Dataframe.
- Summary
Suppose we have a dataframe object i.e.
import pandas as pd # List of Tuples matrix = [(22, 34, 23), (33, 31, 11), (44, 16, 21), (55, 32, 22), (66, 33, 27), (77, 35, 11)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) print(dfObj)
Contents of the dataframe object dfObj are,
x y z a 22 34 23 b 33 31 11 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11
Now if we want to call or apply a function on some of the elements of DataFrame. Like on a single or multiple columns or rows of DataFrame? For example,
- Apply a function on a column, that should multiply all the values in column ‘x’ by 2
- Apply a function on a row, that should multiply all the values in row ‘c’ by 10
- Apply a function on a two columns, that should add 10 in all the values in column ‘y’ & ‘z’
Let’s see how to do that using different techniques,
Apply a function to a single column in Dataframe
Suppose we want to square all the values in column ‘z’ for above created DataFrame object dfObj. We can do that using different methods i.e.
Frequently Asked:
- Drop multiple Columns from a Pandas DataFrame
- Pandas – Select Rows where column value is in List
- pandas.apply(): Apply a function to each row/column in Dataframe
- Replace NaN with 0 in Pandas DataFrame
Method 1 : Using Dataframe.apply()
Apply a lambda function to all the columns in dataframe using Dataframe.apply() and inside this lambda function check if column name is ‘z’ then square all the values in it i.e.
import pandas as pd import numpy as np # List of Tuples matrix = [(22, 34, 23), (33, 31, 11), (44, 16, 21), (55, 32, 22), (66, 33, 27), (77, 35, 11)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) print(dfObj) # Apply function numpy.square() to square the value one column only i.e. with column name 'z' modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x) print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n')
Output
Output:
x y z a 22 34 23 b 33 31 11 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11 Modified Dataframe : Squared the values in column 'z' x y z a 22 34 529 b 33 31 121 c 44 16 441 d 55 32 484 e 66 33 729 f 77 35 121
There are 2 other ways to achieve the same effect i.e.
Method 2 : Using [] Operator
Select the column from dataframe as series using [] operator and apply numpy.square() method on it. Then assign it back to column i.e.
# Apply a function to one column and assign it back to the column in dataframe dfObj['z'] = dfObj['z'].apply(np.square)
It will basically square all the values in column ‘z’
Method 3 : Using numpy.square()
# Method 3: # Apply a function to one column and assign it back to the column in dataframe dfObj['z'] = np.square(dfObj['z'])
It will also square all the values in column ‘z’
Apply a function to a single row in Dataframe
Suppose we want to square all the values in row ‘b’ for above created dataframe object dfObj. We can do that using different methods i.e.
Method 1 : Using Dataframe.apply()
Apply a lambda function to all the rows in dataframe using Dataframe.apply() and inside this lambda function check if row index label is ‘b’ then square all the values in it i.e.
import pandas as pd import numpy as np # List of Tuples matrix = [(22, 34, 23), (33, 31, 11), (44, 16, 21), (55, 32, 22), (66, 33, 27), (77, 35, 11)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) print(dfObj) # Apply function numpy.square() to square the values of one row only i.e. row with index name 'b' modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1) print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n')
Output:
x y z a 22 34 23 b 33 31 11 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11 Modified Dataframe : Squared the values in row 'b' x y z a 22 34 23 b 1089 961 121 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11
There are 2 other ways to achieve the same effect i.e.
Method 2 : Using [] Operator
Select the row from dataframe as series using dataframe.loc[] operator and apply numpy.square() method on it. Then assign it back to row i.e.
# Apply a function to one row and assign it back to the row in dataframe dfObj.loc['b'] = dfObj.loc['b'].apply(np.square)
It will basically square all the values in row ‘b’
Method 3 : Using numpy.square()
# Apply a function to one row and assign it back to the column in dataframe dfObj.loc['b'] = np.square(dfObj.loc['b'])
It will also square all the values in row ‘b’.
Apply a function to a certain columns in Dataframe
We can apply a given function to only specified columns too. For example square the values in column ‘x’ & ‘y’ i.e.
import pandas as pd import numpy as np # List of Tuples matrix = [(22, 34, 23), (33, 31, 11), (44, 16, 21), (55, 32, 22), (66, 33, 27), (77, 35, 11)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) print(dfObj) # Apply function numpy.square() to square the value 2 column only i.e. with column names 'x' and 'y' only modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x) print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n')
Output:
x y z a 22 34 23 b 33 31 11 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11 Modified Dataframe : Squared the values in column x & y : x y z a 484 1156 23 b 1089 961 11 c 1936 256 21 d 3025 1024 22 e 4356 1089 27 f 5929 1225 11
Basically we just modified the if condition in lambda function and squared the values in columns with name x & y.
Apply a function to a certain rows in Dataframe
We can apply a given function to only specified rows too. For example square the values in column ‘b’ & ‘c’ i.e.
import pandas as pd import numpy as np # List of Tuples matrix = [(22, 34, 23), (33, 31, 11), (44, 16, 21), (55, 32, 22), (66, 33, 27), (77, 35, 11)] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) print(dfObj) # Apply function numpy.square() to square the values of 2 rows # only i.e. with row index name 'b' and 'c' only modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1) print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')
Output:
x y z a 22 34 23 b 33 31 11 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11 Modified Dataframe : Squared the values in row b & c : x y z a 22 34 23 b 1089 961 121 c 1936 256 441 d 55 32 22 e 66 33 27 f 77 35 11
Basically we just modified the if condition in lambda function and squared the values in rows with name b & c.
Complete example is as follows :
import pandas as pd import numpy as np # List of Tuples matrix = [(22, 34, 23), (33, 31, 11), (44, 16, 21), (55, 32, 22), (66, 33, 27), (77, 35, 11) ] # Create a DataFrame object dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) print("Original Dataframe", dfObj, sep='\n') print('********* Apply a function to a single row or column in DataFrame ********') print('*** Apply a function to a single column *** ') # Method 1: # Apply function numpy.square() to square the value one column only i.e. with column name 'z' modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x) print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n') # Method 2: # Apply a function to one column and assign it back to the column in dataframe dfObj['z'] = dfObj['z'].apply(np.square) # Method 3: # Apply a function to one column and assign it back to the column in dataframe dfObj['z'] = np.square(dfObj['z']) print('*** Apply a function to a single row *** ') dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) # Method 1: # Apply function numpy.square() to square the values of one row only i.e. row with index name 'b' modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1) print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n') # Method 2: # Apply a function to one row and assign it back to the row in dataframe dfObj.loc['b'] = dfObj.loc['b'].apply(np.square) # Method 3: # Apply a function to one row and assign it back to the column in dataframe dfObj.loc['b'] = np.square(dfObj.loc['b']) print('********* Apply a function to certains row or column in DataFrame ********') dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef')) print('Apply a function to certain columns only') # Apply function numpy.square() to square the value 2 column only i.e. with column names 'x' and 'y' only modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x) print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n') print('Apply a function to certain rows only') # Apply function numpy.square() to square the values of 2 rows only i.e. with row index name 'b' and 'c' only modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1) print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')
Output:
Original Dataframe x y z a 22 34 23 b 33 31 11 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11 ********* Apply a function to a single row or column in DataFrame ******** *** Apply a function to a single column *** Modified Dataframe : Squared the values in column 'z' x y z a 22 34 529 b 33 31 121 c 44 16 441 d 55 32 484 e 66 33 729 f 77 35 121 *** Apply a function to a single row *** Modified Dataframe : Squared the values in row 'b' x y z a 22 34 23 b 1089 961 121 c 44 16 21 d 55 32 22 e 66 33 27 f 77 35 11 ********* Apply a function to certains row or column in DataFrame ******** Apply a function to certain columns only Modified Dataframe : Squared the values in column x & y : x y z a 484 1156 23 b 1089 961 11 c 1936 256 21 d 3025 1024 22 e 4356 1089 27 f 5929 1225 11 Apply a function to certain rows only Modified Dataframe : Squared the values in row b & c : x y z a 22 34 23 b 1089 961 121 c 1936 256 441 d 55 32 22 e 66 33 27 f 77 35 11
Summary
We learned about different ways to apply a function to DataFrame columns or rows in Pandas.
Your Python content is far superior than many others. May be if you can structure them in progressive order of learning it will be more beneficial for the visitors.
Good stuff and comprehensive.
Thanks
Anand