How to deal with SettingWithCopyWarning in Pandas?

We generally ignore the warning statements while running the codes, but the real question is whether it is meant to be ignored or may not. In this article, we are going to discuss one type of such warning called “SettingWithCopyWarning” which may impact our work on a frequent basis if we are not working cautiously.

Table of Contents

Introduction

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'Tech' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

Contents of the created dataframe are,

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager    Tech           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

Understanding the SettingWithCopyWarning in Pandas- Case 1

Before getting into solving these warnings, first let’s try to understand the root cause of such warnings. Consider an example, say, we need to change the Team of all the “Program Managers” to “PMO”. Let’s try to change it using the code below.

Advertisements
# change Team value to PMO for program managers
df[df['Designation'] == "Program Manager"]['Team'] = 'PMO'

Output

<ipython-input-2-768a2accd2a0>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df['Designation'] == "Program Manager"]['Team'] = 'PMO'

As noticed, the code ran without any errors but we do get a “SettingWithCopyWarning” warning. So, ideally we should go ahead since there are no errors. But wait, let’s look at the data if the above change worked.

# check data
print (df[df['Designation'] == "Program Manager"])

Output

     Name      Designation  Team  Experience
2  Shanky  Program Manager  Tech           2

Our value is not updated in the DataFrame, so clearly there is some issue, which is what the “SettingWithCopyWarning” was trying to convey. Let’s try the read more carefully, so it says that the “A value is trying to be set on a copy of a slice from a DataFrame.”

This is exactly the problem that we are using chained statement, i.e., first we using the get statement (to filter rows) and then set statement (to set the Team value as PMO). The pandas library is generally not good with such chained statements directly, which is why they have .loc or .iloc functions to execute such things.

Solution for SettingWithCopyWarning in Case 1

Now, since we understand the warning, let’s look at the ideal way to execute such statements. Let’s perform the same thing as above using the .loc statement as below.

# update the value using loc statement
df.loc[df['Designation'] == "Program Manager", "Team"] = 'PMO'

print (df)

Output

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer  Design           4

Voila! We didn’t get any warning this time and our DataFrame is also updated with the “PMO” values for “Program Manager”.

Understanding the SettingWithCopyWarning – Case 2

Let’s look at another scenario where “SettingWithCopyWarning” could possibly cause trouble. Suppose, this time we save the filtered DataFrame (Tech Team) and then we try to update the Team of “Program Manager” as PMO.

# save the filtered dataframe
tech_df = df[df['Team'] == "Tech"]

# updating the Team for program managers
tech_df.loc[tech_df['Designation'] == "Program Manager", "Team"] = 'PMO'

Output

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, value)

We are using the .loc function here but then also resulted in “SettingWithCopyWarning”. What can possibly go wrong here? First, let’s look at the DataFrame if the values got updated.

# check data
print (tech_df)

Output

      Name        Designation  Team  Experience
0  Shubham     Data Scientist  Tech           5
1     Riti      Data Engineer  Tech           7
2   Shanky    Program Manager   PMO           2
4     Aadi  Backend Developer  Tech          11

As observed, the DataFrame got updated with “PMO” values. So, should we continue and ignore the error? Probably not!

Here, the problem with not with the .loc statement rather it is with the first line where we are storing the filtered DataFrame into a new object. This statement is creating a View (meaning any operation on new DataFrame can impact the original DataFrame) and not a Copy (meaning any operation on the new DataFrame will not impact the original DataFrame).

Therefore, in this case, updating the values in “tech_df” might lead to updating the original DataFrame as well (“df”) which is unintended.

Solution for SettingWithCopyWarning in Case 2

Let’s discuss the ideal way to handle this situation. The recommended way is to always use the .copy statement while saving the DataFrame into a different object. Let’s try below.

# save the filtered dataframe using copy
tech_df = df[df['Team'] == "Tech"].copy()

# updating the Team for program managers
tech_df.loc[tech_df['Designation'] == "Program Manager", "Team"] = 'PMO'
print (tech_df)

Output

      Name        Designation  Team  Experience
0  Shubham     Data Scientist  Tech           5
1     Riti      Data Engineer  Tech           7
2   Shanky    Program Manager   PMO           2
4     Aadi  Backend Developer  Tech          11

Here you go, this does not result in any Warnings and our DataFrame is also updated with the required value.

Summary

In this article, we have discussed multiple scenarios of “SettingWithCopyWarning” and how to resolve them. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top