Replace column values based on conditions in Pandas

In this article, we will discuss various methods to replace the column values based on conditions in a pandas DataFrame. Let’s look at the table of contents describing the list of methods.

Table of Contents

Preparing DataSet

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'India', 'Tech',   5, 4),
            ('Riti', 'India', 'Design' ,   7, 7),
            ('Shanky', 'India', 'PMO' ,   2, 2),
            ('Shreya', 'India', 'Design' ,   2, 0),
            ('Aadi', 'US', 'PMO', 11, 5),
            ('Sim', 'US', 'Tech', 4, 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience', 'RelevantExperience'],
                  index = ['A', 'B', 'C', 'D', 'E', 'F'])
print(df)

Contents of the created dataframe are,

      Name Location    Team  Experience  RelevantExperience
A  Shubham    India    Tech           5                   4
B     Riti    India  Design           7                   7
C   Shanky    India     PMO           2                   2
D   Shreya    India  Design           2                   0
E     Aadi       US     PMO          11                   5
F      Sim       US    Tech           4                   4

Method 1: Using .loc property of DataFrame

The loc property comes in handy whenever we want to filter the DataFrame based on certain conditions. Here, we will use the loc to filter the DataFrame and assign a new value to the filtered rows. Let’s understand with an example, say, we need to replace the “Tech” value in the “Team” column with “Tech & Data”.

Advertisements
# replace Tech with "Tech & Data" using loc
df.loc[df['Team'] == 'Tech', 'Team'] = 'Tech & Data'

print (df)

Output

      Name Location         Team  Experience  RelevantExperience
A  Shubham    India  Tech & Data           5                   4
B     Riti    India       Design           7                   7
C   Shanky    India          PMO           2                   2
D   Shreya    India       Design           2                   0
E     Aadi       US          PMO          11                   5
F      Sim       US  Tech & Data           4                   4

As observed, we first filtered all the rows satisfying the condition and then replaced the value by assigning the new value to the filtered rows.

Method 2: Using numpy.where() method

Another method is to use the numpy.where() function to replace values based on the condition. Let’s look at the function syntax and implement it in the above example.

np.where(condition, value if condition is TRUE, value if condition is False)

# replace Tech with "Tech & Data" using np.where
df['Team'] = np.where(df['Team'] == 'Tech', 'Tech & Data', df['Team'])

print (df)

Output

      Name Location         Team  Experience  RelevantExperience
A  Shubham    India  Tech & Data           5                   4
B     Riti    India       Design           7                   7
C   Shanky    India          PMO           2                   2
D   Shreya    India       Design           2                   0
E     Aadi       US          PMO          11                   5
F      Sim       US  Tech & Data           4                   4

Method 3: Using DataFrame.where() method

There is a where() method in pandas DataFrame as well. Let’s look at the syntax and implementation here.

pd.DataFrame['column_name'].where(~(condition), other=value if condition is True, inplace=True)

Let’s implement it on the same example discussed above.

# replace Tech with "Tech & Data" using DataFrame.where
df['Team'].where(~(df.Team == 'Tech'), other='Tech & Data', inplace=True)

print(df)

Output

      Name Location         Team  Experience  RelevantExperience
A  Shubham    India  Tech & Data           5                   4
B     Riti    India       Design           7                   7
C   Shanky    India          PMO           2                   2
D   Shreya    India       Design           2                   0
E     Aadi       US          PMO          11                   5
F      Sim       US  Tech & Data           4                   4

The inplace=True helps us store the changes directly in the same DataFrame.

Method 4: Using mask() function from pandas

The final method is to use the masking function from pandas which are generally used for replacing the values of any row/column based on certain conditions. Let’s implement using the above example.

# replace Tech with "Tech & Data" using masking
df['Team'].mask(lambda col: col == 'Tech', 'Tech & Data', inplace=True)

print (df)

Output

      Name Location         Team  Experience  RelevantExperience
A  Shubham    India  Tech & Data           5                   4
B     Riti    India       Design           7                   7
C   Shanky    India          PMO           2                   2
D   Shreya    India       Design           2                   0
E     Aadi       US          PMO          11                   5
F      Sim       US  Tech & Data           4                   4

As observed, we have got a similar output as the above methods.

Summary

In this article, we have discussed how to replace column values based on conditions in pandas DataFrame. Thanks.

Advertisements

Thanks for reading.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top