In Pandas, A DataFrame is a two-dimensional array. Many times while working with pandas DataFrame, we need to remap the values of a specific column with dictionary and preserve NaNs. In this article, we will learn how to do that.
Table Of Contents
For that we need to create a new column by mapping the DataFrame column values with the Dictionary Key.
There are different methods to remap values in pandas DataFrame column with a dictionary and preserve NaNs. Let’s discuss each method one by one.
Remap values in a Column with Dictionary using DataFrame.map()
We can create a new column by mapping the values of an existing DataFrame column with the keys of a Dictionary using the DataFrame.map() function. We will pass a dictionary as an argument to map() function. In this dictionary keys are mapped with the values of an existing column. Corresponding values in the dictionary will be used to create a new column.
Example of remap column values with a dict using DataFrame.map()
A script to create new column course_code by remapping course code with the course column using DataFrame.map() and a dictionary.
Frequently Asked:
import pandas as pd import numpy as np student = {'Rollno':[1,2,3,4,5], 'Name' :["Reema","Rekha","Jaya","Susma","Meena"], 'Duration':['120days','150days','130days', None,np.nan], 'Course':["BCA","BSc","MCA","MSc","BBA"] } df = pd.DataFrame(student) print(df) # Difine Dict with the key-value pair to remap. dict_course_code = {"BCA" : 'BC', "BSc" : 'BS', "MCA": 'MC', "MSc" : 'MS', "BBA": 'BB'} # Create a new column by mapping values of an existing column df['Course_code'] = df['Course'].map(dict_course_code) print(df)
Output
Rollno Name Duration Course 0 1 Reema 120days BCA 1 2 Rekha 150days BSc 2 3 Jaya 130days MCA 3 4 Susma None MSc 4 5 Meena NaN BBA Rollno Name Duration Course Course_code 0 1 Reema 120days BCA BC 1 2 Rekha 150days BSc BS 2 3 Jaya 130days MCA MC 3 4 Susma None MSc MS 4 5 Meena NaN BBA BB
In the above script, the DataFrame.map() function is used to remap course column value with the key-value pairs of a dictionary and create new column of course_code which contains the remaped value of each course.
Example of Remapping column values while preserve values(NaN)
A script to fill NaN values, if the mapping value for a particular record is not present in dictionary.
import pandas as pd import numpy as np student= { 'Rollno':[1,2,3,4,5], 'Name' :["Reema","Rekha","Jaya","Susma","Meena"], 'Duration':['120days','150days','130days', None, np.nan], 'Course':["BCA","BSc","MCA","MSc","BBA"] } df = pd.DataFrame(student) print(df) # Define Dict with the key-value pair to remap. dict_course_code = {"BCA" : 'BC', "BSc" : 'BS', "MCA": 'MC'} # Create a new column by mapping values of an existing column # Fill missing values in column with NaN df['Course_code'] = df['Course'].map(dict_course_code).fillna(df['Course']) print(df)
Output
Rollno Name Duration Course 0 1 Reema 120days BCA 1 2 Rekha 150days BSc 2 3 Jaya 130days MCA 3 4 Susma None MSc 4 5 Meena NaN BBA Rollno Name Duration Course Course_code 0 1 Reema 120days BCA BC 1 2 Rekha 150days BSc BS 2 3 Jaya 130days MCA MC 3 4 Susma None MSc MSc 4 5 Meena NaN BBA BBA
In the above script, we have created a DataFrame with four columns. Then created a dictionary to map values of course column with Course_code. But the remap value for course MCA and BBA don’t exists. Therefore, fillna() is used to fill the non existing value with the NaN.
Remap values in a Column with Dictionary using DataFrame.replace()
The DataFrame.replace() method has different overloaded implementations. We can use the one which takes a Dictionary (Dict) to remap the column values. As you know Dictionary contains key-value pairs, where the key is the existing value on a column and value is the replacement value.
Example of Remap Column Values with a Dict Using Pandas DataFrame.replace()
A script to remap course name with the code using DataFrame.replace().
import pandas as pd import numpy as np student= { 'Rollno':[1,2,3,4,5], 'Name' :["Reema","Rekha","Jaya","Susma","Meena"], 'Duration':['120days','150days','130days', None, np.nan], 'Course':["BCA","BSc","MCA","MSc","BBA"] } df = pd.DataFrame(student) print(df) # Define Dict with the key-value pair to remap. dictObj = { "BCA" : 'BC', "BSc" : 'BS', "MCA": 'MC', "MSc" : 'MS', "BBA": 'BB'} df = df.replace({"Course": dictObj}) print(df)
Output
Rollno Name Duration Course 0 1 Reema 120days BCA 1 2 Rekha 150days BSc 2 3 Jaya 130days MCA 3 4 Susma None MSc 4 5 Meena NaN BBA Rollno Name Duration Course 0 1 Reema 120days BC 1 2 Rekha 150days BS 2 3 Jaya 130days MC 3 4 Susma None MS 4 5 Meena NaN BB>
In the above script, first we have created a DataFrame with four columns i.e. rollno, name, duration and course. Then we defined a dictionary with key-value pairs. Then using dataframe.replace() function. we remaped course name with the codes.
Example of Remap None or NaN Column Values
A script to remap none or NaN value of duration column value with 150 days using dataframe.replace() function.
import pandas as pd import numpy as np students = {'Rollno':[1,2,3,4,5], 'Name' :["Reema","Rekha","Jaya","Susma","Meena"], 'Duration':['120days','150days','130days', None, np.nan], 'Course':["BCA","BSc","MCA","MSc","BBA"] } df = pd.DataFrame(students) print(df) # Define Dict with the key-value pairs to remap dict_duration = {"120days" : '120', "150days" : '150', "130days": '130', np.nan:'150'} # Remap all values in 'Duration' column with a dictionary df.replace( {"Duration": dict_duration}, inplace=True) print(df)
Output
Rollno Name Duration Course 0 1 Reema 120days BCA 1 2 Rekha 150days BSc 2 3 Jaya 130days MCA 3 4 Susma None MSc 4 5 Meena NaN BBA Rollno Name Duration Course 0 1 Reema 120 BCA 1 2 Rekha 150 BSc 2 3 Jaya 130 MCA 3 4 Susma 150 MSc 4 5 Meena 150 BBA
In the above script, first we created a DataFrame with four columns rollno, name, duration and course. Then we created a Dictionary with key-value pairs, where values of column duration are mapped. In that we mapped the none and NaNs value with 150 days. Then we used the Dataframe.replace() to remap values of ‘Duration’ with the dictionary.
Remap Multiple Column Values in single dataframe.replace() function
A script to remap two columns i.e. courses and duration with respective dictionary values.
import pandas as pd import numpy as np student= { 'Rollno':[1,2,3,4,5], 'Name' :["Reema","Rekha","Jaya","Susma","Meena"], 'Duration':['120days','150days','130days', None,np.nan], 'Course':["BCA","BSc","MCA","MSc","BBA"] } df = pd.DataFrame(student) print(df) # Define Dictionaries with the key-value pair to remap. dict_obj = {"BCA" : 'BC', "BSc" : 'BS', "MCA": 'MC', "MSc" : 'MS', "BBA": 'BB'} dict_duration = {"120days" : '120', "150days" : '150', "130days" : '130', np.nan :'150'} # Map column Course with first dictionary # Map column Duration with second dictionary df.replace({"Course": dict_obj, "Duration": dict_duration}, inplace=True) print(df)
Output
Rollno Name Duration Course 0 1 Reema 120days BCA 1 2 Rekha 150days BSc 2 3 Jaya 130days MCA 3 4 Susma None MSc 4 5 Meena NaN BBA Rollno Name Duration Course 0 1 Reema 120 BC 1 2 Rekha 150 BS 2 3 Jaya 130 MC 3 4 Susma 150 MS 4 5 Meena 150 BB
Summary
In the article we learned how to remap values in pandas DataFrame column with a dictionary and preserve NaNs. Happy Learning.