This article will discuss different ways to replace a Pandas DataFrame column with a dictionary in Python.
A DataFrame is a data structure that stores the data in rows and columns. We can create a DataFrame using pandas.DataFrame() method. Let’s create a dataframe with four rows and two columns.
import pandas as pd # Create the DataFrame with two columns and four rows df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'], 'Subjects':['java','php','html','jsp'] }) # Display the DataFrame print(df)
Output:
Name Subjects 0 sravan java 1 harsha php 2 ojaswi html 3 jyothika jsp
Replace column values with a Dictionary in Dataframe using replace()
In Python, the Pandas module provides a function replace() to change the content of the Dataframe. In one of its overloaded implementation, it accepts a dictionary of dictionaries like this,
DataFrame.replace({ 'column_name_1': { 'to_replace_1': 'value_1', 'to_replace_2': 'value_2', 'to_replace_3': 'value_3'}, 'column_name_2': { 'to_replace_4': 'value_4', 'to_replace_5': 'value_5', 'to_replace_6': 'value_6'}})
In this dictionary, the key is the column name, and the associated value is another dictionary, which contains the values to be replaced and replacement values. For example, the above statement will replace the following items in the Dataframe,
- In column “column_name_1” it will replace,
- “to_replace_1” with “value_1”
- “to_replace_2” with “value_2”
- “to_replace_3” with “value_3”
- In column “column_name_2” it will replace,
- “to_replace_4” with “value_4”
- “to_replace_5” with “value_5”
- “to_replace_6” with “value_6”
Let’s use this to replace a column values with a dictionary.
Replace single column in dataframe using dictionary
To replace a column value with a dictionary in a DataFrame, create a dictionary with column name as key. In the value field, pass another dictionary that contains the values to be replaced and their replacement. For example,
import pandas as pd # Create the DataFrame with two columns and four rows df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'], 'Subjects':['java','php','html','jsp'] }) # Display the DataFrame print(df) # create a dictionary to replace the Name column # with Full names replace_data = { "sravan": 'Sravan Kumar', "harsha": 'Harsh Vardhan', "ojaswi": 'Pinkey', "jyothika": 'Jyothika Chowdary'} # Replace the values in 'Name' column with the dictionary df = df.replace({"Name": replace_data}) # Display the DataFrame print(df)
Output:
Name Subjects 0 sravan java 1 harsha php 2 ojaswi html 3 jyothika jsp Name Subjects 0 Sravan Kumar java 1 Harsh Vardhan php 2 Pinkey html 3 Jyothika Chowdary jsp
It replaced the values in column ‘Name’ with a dictionary.
Replace values in multiple columns using dictionary
To replace the contents in multiple columns with a dictionary. Created a dictionary of dictionaries where each column name is associated with a nested dictionary of values to be replaced. For example, let’s see how o replace the values of column ‘Name’ and ‘Student’ in a Dataframe with a dictionary,
import pandas as pd # Create the DataFrame with two columns and four rows df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'], 'Subjects':['java','php','html','jsp'] }) # Display the DataFrame print(df) # create a dictionary to replace the Name column # with Full names name_data = { "sravan": 'Sravan Kumar', "harsha": 'Harsh Vardhan', "ojaswi": 'Pinkey', "jyothika": 'Jyothika Chowdary'} # create a dictionary to replace the Subject column # with other values subject_data = {"java": 'OOPS', "php" : 'PPH - MYSQL', "html": 'FRONTEND DEVELOPMENT', "jsp" : 'SERVER_SIDE DEVELOPMENT'} # Replace the values in 'Name' & 'Subject' column with the dictionary df = df.replace({"Name": name_data, "Subject": subject_data}) # Display the DataFrame print(df)
Output
Name Subjects 0 sravan java 1 harsha php 2 ojaswi html 3 jyothika jsp Name Subjects 0 Sravan Kumar java 1 Harsh Vardhan php 2 Pinkey html 3 Jyothika Chowdary jsp
Replace column values with a Dictionary using map()
In Pandas, the Series class provides a function map(), which accepts a dictionary as an argument. It replaces the values in the calling Series object based on the mapping in the provided dictionary. But the values which are not the in the dictionary will be converted into NaN.
We can select a column of DataFrame as a Series object, call the map() function, and pass a dictionary as an argument. The dictionary will contain the mapping of values to be replaced. For example,
df['Name'].map({ 'old_value_1' : 'new_value_1', 'old_value_2' : 'new_value_2', 'old_value_3' : 'new_value_3'})
This line will make following modifications in the DataFrame,
- In Column ‘Name’, it will replace,
- ‘old_value_1’ with ‘new_value_1’
- ‘old_value_2’ with ‘new_value_2’
- ‘old_value_3’ with ‘new_value_3’
- All other values in column ‘Name’ will be replaced by NaN
It might be possible that the mapping dictionary contains only a few values that need to be replaced in the column. But all other values in the column will be set to NaN. To prevent that, call the fillna() function after that with original column values as argument. It will ensure that values that are not present in the dictionary will not be converted into NaN in the column. Basically, it will help us retain the values we don’t want to convert through the dictionary. It’s syntax will be like,
df['Name'].map({ 'old_value_1' : 'new_value_1', 'old_value_2' : 'new_value_2', 'old_value_3' : 'new_value_3'}).fillna(df['Name'])
Let’s use this technique to replace few values in a Dataframe column through a dictionary,
import pandas as pd # Create the DataFrame with two columns and four rows df = pd.DataFrame({ 'Name': ['sravan', 'harsha', 'ojaswi', 'jyothika'], 'Subjects':['java','php','html','jsp'] }) # Display the DataFrame print(df) # create a dictionary to replace the Name column # with Full names name_data = { "sravan": 'Sravan Kumar', "harsha": 'Harsh Vardhan'} # Replace values in a column based on the dictionary df['Name'] = df['Name'].map(name_data).fillna(df['Name']) # Display the DataFrame print(df)
Output:
Name Subjects 0 sravan java 1 harsha php 2 ojaswi html 3 jyothika jsp Name Subjects 0 Sravan Kumar java 1 Harsh Vardhan php 2 ojaswi html 3 jyothika jsp
We replaced only two values in the column ‘Name’. All other values remained as previous.
Summary
In this article, we learned how to replace dataframe column with dictionary in a Pandas dataframe using replace() and map() methods.
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.