In this article, we will discuss how to create a column based on certain conditions in a pandas DataFrame.
Table of Contents
Preparing DataSet
To quickly get started, let’s create two sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'India', 'Tech', 5), ('Riti', 'India', 'Tech' , 7), ('Shanky', 'India', 'PMO' , 2), ('Shreya', 'India', 'Design' , 2), ('Aadi', 'US', 'Tech', 11), ('Sim', 'US', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Location', 'Team', 'Experience']) print(df)
Contents of the created dataframe are,
Name Location Team Experience 0 Shubham India Tech 5 1 Riti India Tech 7 2 Shanky India PMO 2 3 Shreya India Design 2 4 Aadi US Tech 11 5 Sim US Tech 4
Method 1: Using numpy.where function
The numpy.where() function is generally used in such cases where we want to create a column based on certain conditions on any other existing column. Let’s try to understand with an example, say, we need to create another column where employees with more than 3 years of experience are marked as “Senior” while others as “Junior”.
# creating a new column based on a condition df['experience_type'] = np.where(df['Experience']>3, "Senior", "Junior") print (df)
Output
Name Location Team Experience experience_type 0 Shubham India Tech 5 Senior 1 Riti India Tech 7 Senior 2 Shanky India PMO 2 Junior 3 Shreya India Design 2 Junior 4 Aadi US Tech 11 Senior 5 Sim US Tech 4 Senior
The numpy.where() function takes three arguments – first is the condition on the column, second is the value to assign if the condition is True, and the last is the value to assign if the condition is False.
Frequently Asked:
Method 2: Using numpy.select() function
The numpy.select() comes in handy when we want to have multiple conditions on the existing column. Let’s slightly modify the above condition, say, we need to have “Leader” for an experience of more than 10 years, “Senior” for experience between 3-10 years, and the rest as “Juniors”.
# define conditions conditions = [ (df['Experience'] < 3), (df['Experience'] >= 3) & (df['Experience'] <= 10), (df['Experience'] > 10) ] # define values to assign values = ['Junior', 'Senior', 'Leader'] # pass it in np.select() df['experience_type'] = np.select(conditions, values) print (df)
Output
Name Location Team Experience experience_type 0 Shubham India Tech 5 Senior 1 Riti India Tech 7 Senior 2 Shanky India PMO 2 Junior 3 Shreya India Design 2 Junior 4 Aadi US Tech 11 Leader 5 Sim US Tech 4 Senior
Method 3: Using custom function
Instead of having multiple conditions as above, we can also create a custom function and use the apply method to assign values. Let’s understand by looking at the code below.
# creating a new column based on a condition def get_experience_type(experience): if experience > 10: return 'Leader' elif experience > 3: return 'Senior' else: return 'Junior' df['experience_type'] = df['Experience'].apply(get_experience_type) print (df)
Output
Name Location Team Experience experience_type 0 Shubham India Tech 5 Senior 1 Riti India Tech 7 Senior 2 Shanky India PMO 2 Junior 3 Shreya India Design 2 Junior 4 Aadi US Tech 11 Leader 5 Sim US Tech 4 Senior
Custom functions offer more flexibility to add more conditions or tweak it the way we want. However, in the case of simple conditions, we should stick to the above methods only.
Method 4: Using List Comprehension
List comprehension is another efficient way to assign values based on some conditions. Let’s try to again have “Senior” and “Junior” based on the experience condition.
# creating a new column based on a condition df['experience_type'] = ['Senior' if i > 3 else 'Junior' for i in df['Experience']] print (df)
Output
Name Location Team Experience experience_type 0 Shubham India Tech 5 Senior 1 Riti India Tech 7 Senior 2 Shanky India PMO 2 Junior 3 Shreya India Design 2 Junior 4 Aadi US Tech 11 Senior 5 Sim US Tech 4 Senior
We are able to achieve it in a single line of code, also it takes comparatively lower time than the other methods.
The complete example is as follows,
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'India', 'Tech', 5), ('Riti', 'India', 'Tech' , 7), ('Shanky', 'India', 'PMO' , 2), ('Shreya', 'India', 'Design' , 2), ('Aadi', 'US', 'Tech', 11), ('Sim', 'US', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Location', 'Team', 'Experience']) print(df) # creating a new column based on a condition df['experience_type'] = np.where(df['Experience']>3, "Senior", "Junior") print (df) # define conditions conditions = [ (df['Experience'] < 3), (df['Experience'] >= 3) & (df['Experience'] <= 10), (df['Experience'] > 10) ] # define values to assign values = ['Junior', 'Senior', 'Leader'] # pass it in np.select() df['experience_type'] = np.select(conditions, values) print (df) # creating a new column based on a condition def get_experience_type(experience): if experience > 10: return 'Leader' elif experience > 3: return 'Senior' else: return 'Junior' df['experience_type'] = df['Experience'].apply(get_experience_type) print (df) # creating a new column based on a condition df['experience_type'] = ['Senior' if i > 3 else 'Junior' for i in df['Experience']] print (df)
Output:
Name Location Team Experience 0 Shubham India Tech 5 1 Riti India Tech 7 2 Shanky India PMO 2 3 Shreya India Design 2 4 Aadi US Tech 11 5 Sim US Tech 4 Name Location Team Experience experience_type 0 Shubham India Tech 5 Senior 1 Riti India Tech 7 Senior 2 Shanky India PMO 2 Junior 3 Shreya India Design 2 Junior 4 Aadi US Tech 11 Senior 5 Sim US Tech 4 Senior Name Location Team Experience experience_type 0 Shubham India Tech 5 Senior 1 Riti India Tech 7 Senior 2 Shanky India PMO 2 Junior 3 Shreya India Design 2 Junior 4 Aadi US Tech 11 Leader 5 Sim US Tech 4 Senior Name Location Team Experience experience_type 0 Shubham India Tech 5 Senior 1 Riti India Tech 7 Senior 2 Shanky India PMO 2 Junior 3 Shreya India Design 2 Junior 4 Aadi US Tech 11 Leader 5 Sim US Tech 4 Senior Name Location Team Experience experience_type 0 Shubham India Tech 5 Senior 1 Riti India Tech 7 Senior 2 Shanky India PMO 2 Junior 3 Shreya India Design 2 Junior 4 Aadi US Tech 11 Senior 5 Sim US Tech 4 Senior
Summary
In this article, we have discussed how to create a column based on conditions in Pandas.