Create a column based on condition in Pandas DataFrame

In this article, we will discuss how to create a column based on certain conditions in a pandas DataFrame.

Table of Contents

Preparing DataSet
Method 1: Using numpy.where function
Method 2: Using numpy.select() function
Method 3: Using custom function
Method 4: Using List Comprehension
Summary

Preparing DataSet

To quickly get started, let’s create two sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'India', 'Tech',   5),
            ('Riti', 'India', 'Tech' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'Tech', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])
print(df)

Contents of the created dataframe are,

      Name Location    Team  Experience
0  Shubham    India    Tech           5
1     Riti    India    Tech           7
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
4     Aadi       US    Tech          11
5      Sim       US    Tech           4

Method 1: Using numpy.where function

The numpy.where() function is generally used in such cases where we want to create a column based on certain conditions on any other existing column. Let’s try to understand with an example, say, we need to create another column where employees with more than 3 years of experience are marked as “Senior” while others as “Junior”.

# creating a new column based on a condition
df['experience_type'] = np.where(df['Experience']>3, "Senior", "Junior")

print (df)

Output

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Senior
5      Sim       US    Tech           4          Senior

The numpy.where() function takes three arguments – first is the condition on the column, second is the value to assign if the condition is True, and the last is the value to assign if the condition is False.

Frequently Asked:

Method 2: Using numpy.select() function

The numpy.select() comes in handy when we want to have multiple conditions on the existing column. Let’s slightly modify the above condition, say, we need to have “Leader” for an experience of more than 10 years, “Senior” for experience between 3-10 years, and the rest as “Juniors”.

# define conditions
conditions = [
    (df['Experience'] < 3),
    (df['Experience'] >= 3) & (df['Experience'] <= 10),
    (df['Experience'] > 10)
]

# define values to assign
values = ['Junior', 'Senior', 'Leader']

# pass it in np.select()
df['experience_type'] = np.select(conditions, values)

print (df)

Output

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Leader
5      Sim       US    Tech           4          Senior

Method 3: Using custom function

Instead of having multiple conditions as above, we can also create a custom function and use the apply method to assign values. Let’s understand by looking at the code below.

# creating a new column based on a condition
def get_experience_type(experience):
    if experience > 10:
        return 'Leader'
    elif experience > 3:
        return 'Senior'
    else:
        return 'Junior'

df['experience_type'] = df['Experience'].apply(get_experience_type)
print (df)

Output

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Leader
5      Sim       US    Tech           4          Senior

Custom functions offer more flexibility to add more conditions or tweak it the way we want. However, in the case of simple conditions, we should stick to the above methods only.

Method 4: Using List Comprehension

List comprehension is another efficient way to assign values based on some conditions. Let’s try to again have “Senior” and “Junior” based on the experience condition.

# creating a new column based on a condition
df['experience_type'] = ['Senior' if i > 3 else 'Junior' for i in df['Experience']]

print (df)

Output

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Senior
5      Sim       US    Tech           4          Senior

We are able to achieve it in a single line of code, also it takes comparatively lower time than the other methods.

The complete example is as follows,

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'India', 'Tech',   5),
            ('Riti', 'India', 'Tech' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'Tech', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])
print(df)

# creating a new column based on a condition
df['experience_type'] = np.where(df['Experience']>3, "Senior", "Junior")

print (df)

# define conditions
conditions = [
    (df['Experience'] < 3),
    (df['Experience'] >= 3) & (df['Experience'] <= 10),
    (df['Experience'] > 10)
]

# define values to assign
values = ['Junior', 'Senior', 'Leader']

# pass it in np.select()
df['experience_type'] = np.select(conditions, values)

print (df)

# creating a new column based on a condition
def get_experience_type(experience):
    if experience > 10:
        return 'Leader'
    elif experience > 3:
        return 'Senior'
    else:
        return 'Junior'

df['experience_type'] = df['Experience'].apply(get_experience_type)
print (df)

# creating a new column based on a condition
df['experience_type'] = ['Senior' if i > 3 else 'Junior' for i in df['Experience']]

print (df)

Output:

      Name Location    Team  Experience
0  Shubham    India    Tech           5
1     Riti    India    Tech           7
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
4     Aadi       US    Tech          11
5      Sim       US    Tech           4

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Senior
5      Sim       US    Tech           4          Senior

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Leader
5      Sim       US    Tech           4          Senior

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Leader
5      Sim       US    Tech           4          Senior

      Name Location    Team  Experience experience_type
0  Shubham    India    Tech           5          Senior
1     Riti    India    Tech           7          Senior
2   Shanky    India     PMO           2          Junior
3   Shreya    India  Design           2          Junior
4     Aadi       US    Tech          11          Senior
5      Sim       US    Tech           4          Senior

Summary

In this article, we have discussed how to create a column based on conditions in Pandas.

Preparing DataSet

Method 1: Using numpy.where function

Frequently Asked:

Method 2: Using numpy.select() function

Method 3: Using custom function

Method 4: Using List Comprehension

Summary

Related posts:

Share your love

Leave a Comment Cancel Reply