Replace Column values with Dictionary in Pandas Dataframe

This article will discuss different ways to replace a Pandas DataFrame column with a dictionary in Python.

A DataFrame is a data structure that stores the data in rows and columns. We can create a DataFrame using pandas.DataFrame() method. Let’s create a dataframe with four rows and two columns.

import pandas as pd

# Create the DataFrame with two columns and four rows
df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'],
                    'Subjects':['java','php','html','jsp'] })

# Display the DataFrame
print(df)  

Output:

       Name Subjects
0    sravan     java
1    harsha      php
2    ojaswi     html
3  jyothika      jsp

Replace column values with a Dictionary in Dataframe using replace()

In Python, the Pandas module provides a function replace() to change the content of the Dataframe. In one of its overloaded implementation, it accepts a dictionary of dictionaries like this,

DataFrame.replace({ 'column_name_1': { 'to_replace_1': 'value_1',
                                       'to_replace_2': 'value_2',
                                       'to_replace_3': 'value_3'},
                    'column_name_2': { 'to_replace_4': 'value_4',
                                       'to_replace_5': 'value_5',
                                       'to_replace_6': 'value_6'}})

In this dictionary, the key is the column name, and the associated value is another dictionary, which contains the values to be replaced and replacement values. For example, the above statement will replace the following items in the Dataframe,

  • In column “column_name_1” it will replace,
    • “to_replace_1” with “value_1”
    • “to_replace_2” with “value_2”
    • “to_replace_3” with “value_3”
  • In column “column_name_2” it will replace,
    • “to_replace_4” with “value_4”
    • “to_replace_5” with “value_5”
    • “to_replace_6” with “value_6”

Let’s use this to replace a column values with a dictionary.

Replace single column in dataframe using dictionary

To replace a column value with a dictionary in a DataFrame, create a dictionary with column name as key. In the value field, pass another dictionary that contains the values to be replaced and their replacement. For example,

import pandas as pd

# Create the DataFrame with two columns and four rows
df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'],
                    'Subjects':['java','php','html','jsp'] })

# Display the DataFrame
print(df)  

# create a dictionary to replace the Name column
# with Full names
replace_data = { "sravan": 'Sravan Kumar',
                 "harsha": 'Harsh Vardhan',
                 "ojaswi": 'Pinkey',
                 "jyothika": 'Jyothika Chowdary'}

# Replace the values in 'Name' column with the dictionary
df = df.replace({"Name": replace_data})

# Display the DataFrame
print(df)  

Output:

       Name Subjects
0    sravan     java
1    harsha      php
2    ojaswi     html
3  jyothika      jsp

                Name Subjects
0       Sravan Kumar     java
1      Harsh Vardhan      php
2             Pinkey     html
3  Jyothika Chowdary      jsp

It replaced the values in column ‘Name’ with a dictionary.

Replace values in multiple columns using dictionary

To replace the contents in multiple columns with a dictionary. Created a dictionary of dictionaries where each column name is associated with a nested dictionary of values to be replaced. For example, let’s see how o replace the values of column ‘Name’ and ‘Student’ in a Dataframe with a dictionary,

import pandas as pd

# Create the DataFrame with two columns and four rows
df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'],
                    'Subjects':['java','php','html','jsp'] })

# Display the DataFrame
print(df)  

# create a dictionary to replace the Name column
# with Full names
name_data = { "sravan": 'Sravan Kumar',
              "harsha": 'Harsh Vardhan',
              "ojaswi": 'Pinkey',
              "jyothika": 'Jyothika Chowdary'}

# create a dictionary to replace the Subject column
# with other values
subject_data = {"java": 'OOPS',
                "php" : 'PPH - MYSQL',
                "html": 'FRONTEND DEVELOPMENT',
                "jsp" : 'SERVER_SIDE DEVELOPMENT'}


# Replace the values in 'Name' & 'Subject' column with the dictionary
df = df.replace({"Name": name_data,
                 "Subject": subject_data})

# Display the DataFrame
print(df)  

Output

       Name Subjects
0    sravan     java
1    harsha      php
2    ojaswi     html
3  jyothika      jsp

                Name Subjects
0       Sravan Kumar     java
1      Harsh Vardhan      php
2             Pinkey     html
3  Jyothika Chowdary      jsp

Replace column values with a Dictionary using map()

In Pandas, the Series class provides a function map(), which accepts a dictionary as an argument. It replaces the values in the calling Series object based on the mapping in the provided dictionary. But the values which are not the in the dictionary will be converted into NaN.

We can select a column of DataFrame as a Series object, call the map() function, and pass a dictionary as an argument. The dictionary will contain the mapping of values to be replaced. For example,

df['Name'].map({ 'old_value_1' : 'new_value_1',
                 'old_value_2' : 'new_value_2',
                 'old_value_3' : 'new_value_3'})

This line will make following modifications in the DataFrame,

  • In Column ‘Name’, it will replace,
  • ‘old_value_1’ with ‘new_value_1’
  • ‘old_value_2’ with ‘new_value_2’
  • ‘old_value_3’ with ‘new_value_3’
  • All other values in column ‘Name’ will be replaced by NaN

It might be possible that the mapping dictionary contains only a few values that need to be replaced in the column. But all other values in the column will be set to NaN. To prevent that, call the fillna() function after that with original column values as argument. It will ensure that values that are not present in the dictionary will not be converted into NaN in the column. Basically, it will help us retain the values we don’t want to convert through the dictionary. It’s syntax will be like,

df['Name'].map({ 'old_value_1' : 'new_value_1',
                 'old_value_2' : 'new_value_2',
                 'old_value_3' : 'new_value_3'}).fillna(df['Name'])

Let’s use this technique to replace few values in a Dataframe column through a dictionary,

import pandas as pd

# Create the DataFrame with two columns and four rows
df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'],
                    'Subjects':['java','php','html','jsp'] })

# Display the DataFrame
print(df)  

# create a dictionary to replace the Name column
# with Full names
name_data = { "sravan": 'Sravan Kumar',
              "harsha": 'Harsh Vardhan'}

# Replace values in a column based on the dictionary 
df['Name'] = df['Name'].map(name_data).fillna(df['Name'])

# Display the DataFrame
print(df)  

Output:

       Name Subjects
0    sravan     java
1    harsha      php
2    ojaswi     html
3  jyothika      jsp


            Name Subjects
0   Sravan Kumar     java
1  Harsh Vardhan      php
2         ojaswi     html
3       jyothika      jsp

We replaced only two values in the column ‘Name’. All other values remained as previous.

Summary

In this article, we learned how to replace dataframe column with dictionary in a Pandas dataframe using replace() and map() methods.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top