This tutorial will discuss about different ways to replace column values with regex in pandas.
Table Of Contents
Preparing DataSet
First we we will prepare a DataFrame with six rows and four columns.
import pandas as pd import re # List of Tuples employees= [('Mark', 'USA', 'Tech', 5), ('Riti', 'India', 'Tech' , 7), ('Shanky', 'India', 'PMO' , 2), ('Shreya', 'UK', 'Design', 2), ('Aadi', 'USA', 'Tech', 11), ('Sim', 'USA', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Location', 'Team', 'Experience']) print(df)
Output
Name Location Team Experience 0 Mark USA Tech 5 1 Riti India Tech 7 2 Shanky India PMO 2 3 Shreya UK Design 2 4 Aadi USA Tech 11 5 Sim USA Tech 4
Now we will see how to replace values in column ‘Location’ of this DataFrame using regex patterns.
Method 1: Using replace()
To replace string values in a DataFrame column, select the column, and call replace()
function on it with following arguments,
- A Pattern object containing the compiled regular expression.
- Replacement string
- regex parameter with value True
- inplace parameter with value True
Syntax will be like,
Frequently Asked:
df['column_name'].replace( re.compile(regex_pattern), replacement, regex=True, inplace=True)
It will replace only those string values in column with name column_name
, which matches the given ‘regex_pattern’, with the given replacement
string.
Let’s use this to replace all values in column ‘Location’ that starts with “U” with a new value i.e. ‘Australia’.
# Replace only those values in column 'Location' # that starts with letter "U", with the replacement # string "Australia" df['Location'].replace( re.compile("^U.*"), # Regex Patterm "Australia", # replacement string regex=True, inplace=True) print(df)
Output
Name Location Team Experience 0 Mark Australia Tech 5 1 Riti India Tech 7 2 Shanky India PMO 2 3 Shreya Australia Design 2 4 Aadi Australia Tech 11 5 Sim Australia Tech 4
It replaced all the values that starts with “U” in column Location
with “Australia”. Basically, “UK” and “US” got replaced by “Australia” in column “Location”.
Method 2: Using apply()
Apply a lambda function on each value of the specified column. Inside lambda function replace the value it matches the given regex pattern. Let’s see an example where we will replace all the values that starts with “U” in column Location
with “Australia”.
# Replace only those values in column 'Location' # that starts with letter "U", with the replacement # string "Australia" df['Location'] = df['Location'].apply(lambda val: re.sub("^U.*", "Australia", val)) print(df)
Output
Name Location Team Experience 0 Mark Australia Tech 5 1 Riti India Tech 7 2 Shanky India PMO 2 3 Shreya Australia Design 2 4 Aadi Australia Tech 11 5 Sim Australia Tech 4
It replaced all the values that starts with “U” in column Location
with “Australia”. Basically, “UK” and “US” got replaced by “Australia” in column “Location”.
Summary
We learned about different ways to replace column values based on regex in Pandas.