In this article, we will discuss multiple scenarios on how to replace column values in a pandas DataFrame.
Table of Contents
Introduction
Python panda’s library provides a function to replace any value with a new value
pandas.DataFrame.replace(to_replace, value, inplace, ....)
It accepts a few more arguments as well but here we will discuss a few important arguments only i.e.
Arguments:
- to_replace : a value or multiple values that need to be replaced
- value : a value or multiple values to replace any values matching with to_replace
- inplace : True will modify the current DataFrame, False will create a new view
Preparing DataSet
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd # List of Tuples employees = [('Shubham', 'India', 'Tech India', 5), ('Riti', 'India', 'India' , 7), ('Shanky', 'India', 'PMO' , 2), ('Shreya', 'India', 'Design' , 2), ('Aadi', 'US', 'Tech', 11), ('Sim', 'US', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Location', 'Team', 'Experience']) print(df)
Contents of the created dataframe are,
Frequently Asked:
- Pandas: Check if all values in column are zeros
- Create a column based on condition in Pandas DataFrame
- How to normalize columns in Pandas DataFrame?
- Read a CSV file without a header in Pandas
Name Location Team Experience 0 Shubham India Tech India 5 1 Riti India India 7 2 Shanky India PMO 2 3 Shreya India Design 2 4 Aadi US Tech 11 5 Sim US Tech 4
Replace a single value with a new value in a DataFrame Column
We will use the replace function from pandas to replace a single value in a column with a new value. Let’s try to understand with an example, by replacing the value “India” in the “Location” column with “India HQ”.
# replace values df['Location'] = df['Location'].replace("India", "India HQ") print (df)
Output
Name Location Team Experience 0 Shubham India HQ Tech India 5 1 Riti India HQ India 7 2 Shanky India HQ PMO 2 3 Shreya India HQ Design 2 4 Aadi US Tech 11 5 Sim US Tech 4
As observed, the value “India” in the Location column is now replaced with the new value “India HQ”. We can save this output in the same column or set “inplace” attribute as True.
Replace multiple values with multiple values in a DataFrame Column
Now, let’s understand how we can replace multiple values with multiple new values using the same replace function. Say, we need to now change “India” to “India HQ” and “US” to “US HQ” in the Location column.
# replace multiple values df['Location'] = df['Location'].replace(["India", "US"], ["India HQ", "US HQ"]) print (df)
Output
0 India HQ 1 India HQ 2 India HQ 3 India HQ 4 US HQ 5 US HQ Name: Location, dtype: object
Here you go, both the values are now replaced with their respective new values.
Replace multiple values with a single value in a DataFrame Column
Instead of replacing them with their respective values, say, we wanted to replace both “India” and “US” with just a single new value as “HQ”. We can achieve this using the code below.
# replace multiple values with a single value in column 'Location' df['Location'] = df['Location'].replace(["India", "US"], "HQ") print (df)
Output
0 HQ 1 HQ 2 HQ 3 HQ 4 HQ 5 HQ Name: Location, dtype: object
Replace values in the entire DataFrame
Now, let’s consider that we want to replace a value with a new value for all the columns in a DataFrame. We can again use the replace function, but we will not select the column here. Let’s replace the value “India” with “India HQ” from the entire DataFrame.
# replace "India" with "India HQ" in entire DataFrame df = df.replace(["India"], "India HQ") print (df)
Output
Name Location Team Experience 0 Shubham India HQ Tech India 5 1 Riti India HQ India HQ 7 2 Shanky India HQ PMO 2 3 Shreya India HQ Design 2 4 Aadi US Tech 11 5 Sim US Tech 4
Here you go, the value “India” is now replaced with “India HQ” in both the columns “Location” and “Team”.
Summary
In this article, we have discussed multiple scenarios to replace column values in a pandas DataFrame. Thanks.