Replace column values with regex in Pandas

This tutorial will discuss about different ways to replace column values with regex in pandas.

Table Of Contents

Preparing DataSet

First we we will prepare a DataFrame with six rows and four columns.

import pandas as pd
import re

# List of Tuples
employees= [('Mark',   'USA', 'Tech',   5),
            ('Riti',   'India', 'Tech' ,  7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'UK', 'Design', 2),
            ('Aadi',   'USA', 'Tech',   11),
            ('Sim',    'USA', 'Tech',   4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])

print(df)

Output

Name Location    Team  Experience
0    Mark      USA    Tech           5
1    Riti    India    Tech           7
2  Shanky    India     PMO           2
3  Shreya       UK  Design           2
4    Aadi      USA    Tech          11
5     Sim      USA    Tech           4

Now we will see how to replace values in column ‘Location’ of this DataFrame using regex patterns.

Method 1: Using replace()

To replace string values in a DataFrame column, select the column, and call replace() function on it with following arguments,

  • A Pattern object containing the compiled regular expression.
  • Replacement string
  • regex parameter with value True
  • inplace parameter with value True

Syntax will be like,

df['column_name'].replace(
                re.compile(regex_pattern),
                replacement,
                regex=True,
                inplace=True)

It will replace only those string values in column with name column_name, which matches the given ‘regex_pattern’, with the given replacement string.

Let’s use this to replace all values in column ‘Location’ that starts with “U” with a new value i.e. ‘Australia’.

# Replace only those values in column 'Location'
# that starts with letter "U", with the replacement
# string "Australia"
df['Location'].replace(
                re.compile("^U.*"), # Regex Patterm
                "Australia",        # replacement string
                regex=True,
                inplace=True)

print(df)

Output

Name   Location    Team  Experience
0    Mark  Australia    Tech           5
1    Riti      India    Tech           7
2  Shanky      India     PMO           2
3  Shreya  Australia  Design           2
4    Aadi  Australia    Tech          11
5     Sim  Australia    Tech           4

It replaced all the values that starts with “U” in column Location with “Australia”. Basically, “UK” and “US” got replaced by “Australia” in column “Location”.

Method 2: Using apply()

Apply a lambda function on each value of the specified column. Inside lambda function replace the value it matches the given regex pattern. Let’s see an example where we will replace all the values that starts with “U” in column Location with “Australia”.

# Replace only those values in column 'Location'
# that starts with letter "U", with the replacement
# string "Australia"
df['Location'] = df['Location'].apply(lambda val: re.sub("^U.*", "Australia", val))

print(df)

Output

Name   Location    Team  Experience
0    Mark  Australia    Tech           5
1    Riti      India    Tech           7
2  Shanky      India     PMO           2
3  Shreya  Australia  Design           2
4    Aadi  Australia    Tech          11
5     Sim  Australia    Tech           4

It replaced all the values that starts with “U” in column Location with “Australia”. Basically, “UK” and “US” got replaced by “Australia” in column “Location”.

Summary

We learned about different ways to replace column values based on regex in Pandas.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top