This article will discuss different ways to replace a Pandas DataFrame column with a dictionary in Python.
A DataFrame is a data structure that stores the data in rows and columns. We can create a DataFrame using pandas.DataFrame() method. Let’s create a dataframe with four rows and two columns.
import pandas as pd # Create the DataFrame with two columns and four rows df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'], 'Subjects':['java','php','html','jsp'] }) # Display the DataFrame print(df)
Output:
Name Subjects 0 sravan java 1 harsha php 2 ojaswi html 3 jyothika jsp
Replace column values with a Dictionary in Dataframe using replace()
In Python, the Pandas module provides a function replace() to change the content of the Dataframe. In one of its overloaded implementation, it accepts a dictionary of dictionaries like this,
DataFrame.replace({ 'column_name_1': { 'to_replace_1': 'value_1', 'to_replace_2': 'value_2', 'to_replace_3': 'value_3'}, 'column_name_2': { 'to_replace_4': 'value_4', 'to_replace_5': 'value_5', 'to_replace_6': 'value_6'}})
In this dictionary, the key is the column name, and the associated value is another dictionary, which contains the values to be replaced and replacement values. For example, the above statement will replace the following items in the Dataframe,
Frequently Asked:
- In column “column_name_1” it will replace,
- “to_replace_1” with “value_1”
- “to_replace_2” with “value_2”
- “to_replace_3” with “value_3”
- In column “column_name_2” it will replace,
- “to_replace_4” with “value_4”
- “to_replace_5” with “value_5”
- “to_replace_6” with “value_6”
Let’s use this to replace a column values with a dictionary.
Replace single column in dataframe using dictionary
To replace a column value with a dictionary in a DataFrame, create a dictionary with column name as key. In the value field, pass another dictionary that contains the values to be replaced and their replacement. For example,
import pandas as pd # Create the DataFrame with two columns and four rows df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'], 'Subjects':['java','php','html','jsp'] }) # Display the DataFrame print(df) # create a dictionary to replace the Name column # with Full names replace_data = { "sravan": 'Sravan Kumar', "harsha": 'Harsh Vardhan', "ojaswi": 'Pinkey', "jyothika": 'Jyothika Chowdary'} # Replace the values in 'Name' column with the dictionary df = df.replace({"Name": replace_data}) # Display the DataFrame print(df)
Output:
Name Subjects 0 sravan java 1 harsha php 2 ojaswi html 3 jyothika jsp Name Subjects 0 Sravan Kumar java 1 Harsh Vardhan php 2 Pinkey html 3 Jyothika Chowdary jsp
It replaced the values in column ‘Name’ with a dictionary.
Latest Python - Video Tutorial
Replace values in multiple columns using dictionary
To replace the contents in multiple columns with a dictionary. Created a dictionary of dictionaries where each column name is associated with a nested dictionary of values to be replaced. For example, let’s see how o replace the values of column ‘Name’ and ‘Student’ in a Dataframe with a dictionary,
import pandas as pd # Create the DataFrame with two columns and four rows df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'], 'Subjects':['java','php','html','jsp'] }) # Display the DataFrame print(df) # create a dictionary to replace the Name column # with Full names name_data = { "sravan": 'Sravan Kumar', "harsha": 'Harsh Vardhan', "ojaswi": 'Pinkey', "jyothika": 'Jyothika Chowdary'} # create a dictionary to replace the Subject column # with other values subject_data = {"java": 'OOPS', "php" : 'PPH - MYSQL', "html": 'FRONTEND DEVELOPMENT', "jsp" : 'SERVER_SIDE DEVELOPMENT'} # Replace the values in 'Name' & 'Subject' column with the dictionary df = df.replace({"Name": name_data, "Subject": subject_data}) # Display the DataFrame print(df)
Output
Name Subjects 0 sravan java 1 harsha php 2 ojaswi html 3 jyothika jsp Name Subjects 0 Sravan Kumar java 1 Harsh Vardhan php 2 Pinkey html 3 Jyothika Chowdary jsp
Replace column values with a Dictionary using map()
In Pandas, the Series class provides a function map(), which accepts a dictionary as an argument. It replaces the values in the calling Series object based on the mapping in the provided dictionary. But the values which are not the in the dictionary will be converted into NaN.
We can select a column of DataFrame as a Series object, call the map() function, and pass a dictionary as an argument. The dictionary will contain the mapping of values to be replaced. For example,
df['Name'].map({ 'old_value_1' : 'new_value_1', 'old_value_2' : 'new_value_2', 'old_value_3' : 'new_value_3'})
This line will make following modifications in the DataFrame,
- In Column ‘Name’, it will replace,
- ‘old_value_1’ with ‘new_value_1’
- ‘old_value_2’ with ‘new_value_2’
- ‘old_value_3’ with ‘new_value_3’
- All other values in column ‘Name’ will be replaced by NaN
It might be possible that the mapping dictionary contains only a few values that need to be replaced in the column. But all other values in the column will be set to NaN. To prevent that, call the fillna() function after that with original column values as argument. It will ensure that values that are not present in the dictionary will not be converted into NaN in the column. Basically, it will help us retain the values we don’t want to convert through the dictionary. It’s syntax will be like,
df['Name'].map({ 'old_value_1' : 'new_value_1', 'old_value_2' : 'new_value_2', 'old_value_3' : 'new_value_3'}).fillna(df['Name'])
Let’s use this technique to replace few values in a Dataframe column through a dictionary,
import pandas as pd # Create the DataFrame with two columns and four rows df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'], 'Subjects':['java','php','html','jsp'] }) # Display the DataFrame print(df) # create a dictionary to replace the Name column # with Full names name_data = { "sravan": 'Sravan Kumar', "harsha": 'Harsh Vardhan'} # Replace values in a column based on the dictionary df['Name'] = df['Name'].map(name_data).fillna(df['Name']) # Display the DataFrame print(df)
Output:
Name Subjects 0 sravan java 1 harsha php 2 ojaswi html 3 jyothika jsp Name Subjects 0 Sravan Kumar java 1 Harsh Vardhan php 2 ojaswi html 3 jyothika jsp
We replaced only two values in the column ‘Name’. All other values remained as previous.
Summary
In this article, we learned how to replace dataframe column with dictionary in a Pandas dataframe using replace() and map() methods.
Latest Video Tutorials