Replace column values in a Pandas DataFrame

In this article, we will discuss multiple scenarios on how to replace column values in a pandas DataFrame.

Table of Contents

Introduction

Python panda’s library provides a function to replace any value with a new value

pandas.DataFrame.replace(to_replace, value, inplace, ....)

It accepts a few more arguments as well but here we will discuss a few important arguments only i.e.

Arguments:

Advertisements
  • to_replace : a value or multiple values that need to be replaced
  • value : a value or multiple values to replace any values matching with to_replace
  • inplace : True will modify the current DataFrame, False will create a new view

Preparing DataSet

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd

# List of Tuples
employees = [('Shubham', 'India', 'Tech India',   5),
            ('Riti', 'India', 'India' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'Tech', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])
print(df)

Contents of the created dataframe are,

      Name Location        Team  Experience
0  Shubham    India  Tech India           5
1     Riti    India       India           7
2   Shanky    India         PMO           2
3   Shreya    India      Design           2
4     Aadi       US        Tech          11
5      Sim       US        Tech           4

Replace a single value with a new value in a DataFrame Column

We will use the replace function from pandas to replace a single value in a column with a new value. Let’s try to understand with an example, by replacing the value “India” in the “Location” column with “India HQ”.

# replace values
df['Location'] = df['Location'].replace("India", "India HQ")

print (df)

Output

      Name  Location        Team  Experience
0  Shubham  India HQ  Tech India           5
1     Riti  India HQ       India           7
2   Shanky  India HQ         PMO           2
3   Shreya  India HQ      Design           2
4     Aadi        US        Tech          11
5      Sim        US        Tech           4

As observed, the value “India” in the Location column is now replaced with the new value “India HQ”. We can save this output in the same column or set “inplace” attribute as True.

Replace multiple values with multiple values in a DataFrame Column

Now, let’s understand how we can replace multiple values with multiple new values using the same replace function. Say, we need to now change “India” to “India HQ” and “US” to “US HQ” in the Location column.

# replace multiple values
df['Location'] = df['Location'].replace(["India", "US"], ["India HQ", "US HQ"])

print (df)

Output

0    India HQ
1    India HQ
2    India HQ
3    India HQ
4       US HQ
5       US HQ
Name: Location, dtype: object

Here you go, both the values are now replaced with their respective new values.

Replace multiple values with a single value in a DataFrame Column

Instead of replacing them with their respective values, say, we wanted to replace both “India” and “US” with just a single new value as “HQ”. We can achieve this using the code below.

# replace multiple values with a single value in column 'Location'
df['Location'] = df['Location'].replace(["India", "US"], "HQ")

print (df)

Output

0    HQ
1    HQ
2    HQ
3    HQ
4    HQ
5    HQ
Name: Location, dtype: object

Replace values in the entire DataFrame

Now, let’s consider that we want to replace a value with a new value for all the columns in a DataFrame. We can again use the replace function, but we will not select the column here. Let’s replace the value “India” with “India HQ” from the entire DataFrame.

# replace "India" with "India HQ" in entire DataFrame
df = df.replace(["India"], "India HQ")

print (df)

Output

      Name  Location        Team  Experience
0  Shubham  India HQ  Tech India           5
1     Riti  India HQ    India HQ           7
2   Shanky  India HQ         PMO           2
3   Shreya  India HQ      Design           2
4     Aadi        US        Tech          11
5      Sim        US        Tech           4

Here you go, the value “India” is now replaced with “India HQ” in both the columns “Location” and “Team”.

Summary

In this article, we have discussed multiple scenarios to replace column values in a pandas DataFrame. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top