Python: Add column to dataframe in Pandas ( based on other column or list or default value)

In this article we will discuss different ways to how to add new column to dataframe in pandas i.e. using operator [] or assign() function or insert() function or using dictionary. We will also discuss, how to add new column by populating values from a list or by using same value in all indices or by calculating value on new column based on other columns.

Let’s create a Dataframe object i.e.

import pandas as pd

# List of Tuples
students = [('jack', 34, 'Sydeny', 'Australia'),
            ('Riti', 30, 'Delhi', 'India'),
            ('Vikas', 31, 'Mumbai', 'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John', 16, 'New York', 'US'),
            ('Mike', 17, 'las vegas', 'US')]

# Create a DataFrame object
df_obj = pd.DataFrame(students,
                      columns=['Name', 'Age', 'City', 'Country'],
                      index=['a', 'b', 'c', 'd', 'e', 'f'])

Contents of the dataframe dfobj are,

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Now lets discuss different ways to add new columns to this data frame in pandas.

Add column to dataframe in pandas using [] operator

Pandas: Add new column to Dataframe with Values in list

Suppose we want to add a new column ‘Marks’ with default values from a list. Let’s see how to do this,

# Add column with Name Marks
df_obj['Marks'] = [10, 20, 45, 33, 22, 11]

df_obj

Output:

    Name  Age       City    Country  Marks
a   jack   34     Sydeny  Australia     10
b   Riti   30      Delhi      India     20
c  Vikas   31     Mumbai      India     45
d  Neelu   32  Bangalore      India     33
e   John   16   New York         US     22
f   Mike   17  las vegas         US     11

As dataframe df_obj didn’t had any column with name ‘Marks’ , so it added a new column in this dataframe.

But we need to keep these things in mind i.e.

  • If values provided in list are less than number of indexes then it will give ValueError.
  • If Column already exists then it will replace all its values.

Pandas: Add new column to DataFrame with same default value

Now add a new column ‘Total’ with same value 50 in each index i.e each item in this column will have same default value 50,

df_obj['Total'] = 50

df_obj

Output

    Name  Age       City    Country  Marks  Total
a   jack   34     Sydeny  Australia     10     50
b   Riti   30      Delhi      India     20     50
c  Vikas   31     Mumbai      India     45     50
d  Neelu   32  Bangalore      India     33     50
e   John   16   New York         US     22     50
f   Mike   17  las vegas         US     11     50

It added a new column ‘Total‘ and set value 50 at each items in that column.

Pandas: Add column based on another column

Let’s add a new column ‘Percentage‘ where entry at each index will be calculated by the values in other columns at that index i.e.

df_obj['Percentage'] = (df_obj['Marks'] / df_obj['Total']) * 100

df_obj

Output:

    Name  Age       City    Country  Marks  Total  Percentage
a   jack   34     Sydeny  Australia     10     50        20.0
b   Riti   30      Delhi      India     20     50        40.0
c  Vikas   31     Mumbai      India     45     50        90.0
d  Neelu   32  Bangalore      India     33     50        66.0
e   John   16   New York         US     22     50        44.0
f   Mike   17  las vegas         US     11     50        22.0

It added a new column ‘Percentage‘ , where each entry contains the percentage of that student, which was calculated based on Marks & Total column values for that index.

Append column to dataFrame using assign() function

In Python, Pandas Library provides a function to add columns i.e.

DataFrame.assign(**kwargs)

It accepts a keyword & value pairs, where a keyword is column name and value is either list / series or a callable entry. It returns a new dataframe and doesn’t modify the current dataframe.

Let’s add columns in DataFrame using assign().

First of all reset dataframe i.e.

# Create a DataFrame object 
df_obj = pd.DataFrame(students,
 columns=['Name', 'Age', 'City', 'Country'],
 index=['a', 'b', 'c', 'd', 'e', 'f'])

Contents dataframe df_obj are,

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Add new column to DataFrame in Pandas using assign()

Let’s add a column ‘Marks’ i.e.

mod_fd = df_obj.assign(  Marks=[10, 20, 45, 33, 22, 11])

mod_fd

It will return a new dataframe with a new column ‘Marks’ in that Dataframe. Values provided in list will used as column values.

Contents of new dataframe mod_fd are,

    Name  Age       City    Country  Marks
a   jack   34     Sydeny  Australia     10
b   Riti   30      Delhi      India     20
c  Vikas   31     Mumbai      India     45
d  Neelu   32  Bangalore      India     33
e   John   16   New York         US     22
f   Mike   17  las vegas         US     11

Add multiple columns in DataFrame using assign()

We can also add multiple columns using assign() i.e.

df_obj = df_obj.assign(Marks=[10, 20, 45, 33, 22, 11], Total=[50] * 6)

It added both column Marks & Total. Contents of the returned dataframe is,

    Name  Age       City    Country  Marks  Total
a   jack   34     Sydeny  Australia     10     50
b   Riti   30      Delhi      India     20     50
c  Vikas   31     Mumbai      India     45     50
d  Neelu   32  Bangalore      India     33     50
e   John   16   New York         US     22     50
f   Mike   17  las vegas         US     11     50

Add a columns in DataFrame based on other column using lambda function

Add column ‘Percentage’ in dataframe, it’s each value will be calculated based on other columns in each row i.e.

df_obj = df_obj.assign(Percentage=lambda x: (x['Marks'] / x['Total']) * 100)

Contents of the returned dataframe are,

    Name  Age       City    Country  Marks  Total  Percentage
a   jack   34     Sydeny  Australia     10     50        20.0
b   Riti   30      Delhi      India     20     50        40.0
c  Vikas   31     Mumbai      India     45     50        90.0
d  Neelu   32  Bangalore      India     33     50        66.0
e   John   16   New York         US     22     50        44.0
f   Mike   17  las vegas         US     11     50        22.0

Pandas: Insert new column to Dataframe using insert()

First of all reset dataframe i.e.

# Create a DataFrame object 
df_obj = pd.DataFrame(students,
 columns=['Name', 'Age', 'City', 'Country'],
 index=['a', 'b', 'c', 'd', 'e', 'f'])

Contents dataframe df_obj are,

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

In all the previous solution, we added new column at the end of the dataframe, but suppose we want to add or insert a new column in between the other columns of the dataframe, then we can use the insert() function i.e.

# Insert column at the 2nd position of Dataframe
df_obj.insert(2, "Marks", [10, 20, 45, 33, 22, 11], True)

df_obj

Output:

    Name  Age  Marks       City    Country
a   jack   34     10     Sydeny  Australia
b   Riti   30     20      Delhi      India
c  Vikas   31     45     Mumbai      India
d  Neelu   32     33  Bangalore      India
e   John   16     22   New York         US
f   Mike   17     11  las vegas         US

It inserted the column ‘Marks’ in between other columns.

Pandas: Add a column to Dataframe by dictionary

Create a dictionary with keys as the values of new columns and values in dictionary will be the values of any existing column i.e.

ids = [11, 12, 13, 14, 15, 16]

# Provide 'ID' as the column name and for values provide dictionary
df_obj['ID'] = dict(zip(ids, df_obj['Name']))

df_obj

Output:

    Name  Age  Marks       City    Country  ID
a   jack   34     10     Sydeny  Australia  11
b   Riti   30     20      Delhi      India  12
c  Vikas   31     45     Mumbai      India  13
d  Neelu   32     33  Bangalore      India  14
e   John   16     22   New York         US  15
f   Mike   17     11  las vegas         US  16

Here we created a dictionary by zipping the a list of values and existing column ‘Name’. Then set this dictionary as the new column ‘ID’ in  the dataframe.

Advertisements

1 thought on “Python: Add column to dataframe in Pandas ( based on other column or list or default value)”

  1. Thank you so much for such a powerful blog. This site has taught me so much with pandas and helped me understand the practical applications of certain functions more than any site.

    Thanks for taking time to develop such a rich site.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top