Create a column based on condition in Pandas DataFrame

In this article, we will discuss how to create a column based on certain conditions in a pandas DataFrame.

Table of Contents

Preparing DataSet

To quickly get started, let’s create two sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'India', 'Tech',   5),
            ('Riti', 'India', 'Tech' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'Tech', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])
print(df)

Contents of the created dataframe are,

      Name Location    Team  Experience
0  Shubham    India    Tech           5
1     Riti    India    Tech           7
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
4     Aadi       US    Tech          11
5      Sim       US    Tech           4

Method 1: Using numpy.where function

The numpy.where() function is generally used in such cases where we want to create a column based on certain conditions on any other existing column. Let’s try to understand with an example, say, we need to create another column where employees with more than 3 years of experience are marked as “Senior” while others as “Junior”.

Advertisements
# creating a new column based on a condition
df['experience_type'] = np.where(df['Experience']>3, "Senior", "Junior")

print (df)

Output

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Senior
5      Sim       US    Tech           4          Senior

The numpy.where() function takes three arguments – first is the condition on the column, second is the value to assign if the condition is True, and the last is the value to assign if the condition is False.

Method 2: Using numpy.select() function

The numpy.select() comes in handy when we want to have multiple conditions on the existing column. Let’s slightly modify the above condition, say, we need to have “Leader” for an experience of more than 10 years, “Senior” for experience between 3-10 years, and the rest as “Juniors”.

# define conditions
conditions = [
    (df['Experience'] < 3),
    (df['Experience'] >= 3) & (df['Experience'] <= 10),
    (df['Experience'] > 10)
]

# define values to assign
values = ['Junior', 'Senior', 'Leader']

# pass it in np.select()
df['experience_type'] = np.select(conditions, values)

print (df)

Output

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Leader
5      Sim       US    Tech           4          Senior

Method 3: Using custom function

Instead of having multiple conditions as above, we can also create a custom function and use the apply method to assign values. Let’s understand by looking at the code below.

# creating a new column based on a condition
def get_experience_type(experience):
    if experience > 10:
        return 'Leader'
    elif experience > 3:
        return 'Senior'
    else:
        return 'Junior'

df['experience_type'] = df['Experience'].apply(get_experience_type)
print (df)

Output

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Leader
5      Sim       US    Tech           4          Senior

Custom functions offer more flexibility to add more conditions or tweak it the way we want. However, in the case of simple conditions, we should stick to the above methods only.

Method 4: Using List Comprehension

List comprehension is another efficient way to assign values based on some conditions. Let’s try to again have “Senior” and “Junior” based on the experience condition.

# creating a new column based on a condition
df['experience_type'] = ['Senior' if i > 3 else 'Junior' for i in df['Experience']]

print (df)

Output

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Senior
5      Sim       US    Tech           4          Senior

We are able to achieve it in a single line of code, also it takes comparatively lower time than the other methods.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'India', 'Tech',   5),
            ('Riti', 'India', 'Tech' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'Tech', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])
print(df)

# creating a new column based on a condition
df['experience_type'] = np.where(df['Experience']>3, "Senior", "Junior")

print (df)

# define conditions
conditions = [
    (df['Experience'] < 3),
    (df['Experience'] >= 3) & (df['Experience'] <= 10),
    (df['Experience'] > 10)
]

# define values to assign
values = ['Junior', 'Senior', 'Leader']

# pass it in np.select()
df['experience_type'] = np.select(conditions, values)

print (df)

# creating a new column based on a condition
def get_experience_type(experience):
    if experience > 10:
        return 'Leader'
    elif experience > 3:
        return 'Senior'
    else:
        return 'Junior'

df['experience_type'] = df['Experience'].apply(get_experience_type)
print (df)

# creating a new column based on a condition
df['experience_type'] = ['Senior' if i > 3 else 'Junior' for i in df['Experience']]

print (df)

Output:

      Name Location    Team  Experience
0  Shubham    India    Tech           5
1     Riti    India    Tech           7
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
4     Aadi       US    Tech          11
5      Sim       US    Tech           4

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Senior
5      Sim       US    Tech           4          Senior

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Leader
5      Sim       US    Tech           4          Senior

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Leader
5      Sim       US    Tech           4          Senior

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Senior
5      Sim       US    Tech           4          Senior

Summary

In this article, we have discussed how to create a column based on conditions in Pandas.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top