In this article we will discuss different ways to how to add new column to dataframe in pandas i.e. using operator [] or assign() function or insert() function or using dictionary. We will also discuss, how to add new column by populating values from a list or by using same value in all indices or by calculating value on new column based on other columns.

Let’s create a Dataframe object i.e.

import pandas as pd

# List of Tuples
students = [('jack', 34, 'Sydeny', 'Australia'),
            ('Riti', 30, 'Delhi', 'India'),
            ('Vikas', 31, 'Mumbai', 'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John', 16, 'New York', 'US'),
            ('Mike', 17, 'las vegas', 'US')]

# Create a DataFrame object
df_obj = pd.DataFrame(students,
                      columns=['Name', 'Age', 'City', 'Country'],
                      index=['a', 'b', 'c', 'd', 'e', 'f'])

Contents of the dataframe dfobj are,

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Now lets discuss different ways to add new columns to this data frame in pandas.

Add column to dataframe in pandas using [] operator

Pandas: Add new column to Dataframe with Values in list

Suppose we want to add a new column ‘Marks’ with default values from a list. Let’s see how to do this,

# Add column with Name Marks
df_obj['Marks'] = [10, 20, 45, 33, 22, 11]

df_obj

Output:

    Name  Age       City    Country  Marks
a   jack   34     Sydeny  Australia     10
b   Riti   30      Delhi      India     20
c  Vikas   31     Mumbai      India     45
d  Neelu   32  Bangalore      India     33
e   John   16   New York         US     22
f   Mike   17  las vegas         US     11

As dataframe df_obj didn’t had any column with name ‘Marks’ , so it added a new column in this dataframe.

But we need to keep these things in mind i.e.

  • If values provided in list are less than number of indexes then it will give ValueError.
  • If Column already exists then it will replace all its values.

Pandas: Add new column to DataFrame with same default value

Now add a new column ‘Total’ with same value 50 in each index i.e each item in this column will have same default value 50,

df_obj['Total'] = 50

df_obj

Output

    Name  Age       City    Country  Marks  Total
a   jack   34     Sydeny  Australia     10     50
b   Riti   30      Delhi      India     20     50
c  Vikas   31     Mumbai      India     45     50
d  Neelu   32  Bangalore      India     33     50
e   John   16   New York         US     22     50
f   Mike   17  las vegas         US     11     50

It added a new column ‘Total‘ and set value 50 at each items in that column.

Pandas: Add column based on another column

Let’s add a new column ‘Percentage‘ where entry at each index will be calculated by the values in other columns at that index i.e.

df_obj['Percentage'] = (df_obj['Marks'] / df_obj['Total']) * 100

df_obj

Output:

    Name  Age       City    Country  Marks  Total  Percentage
a   jack   34     Sydeny  Australia     10     50        20.0
b   Riti   30      Delhi      India     20     50        40.0
c  Vikas   31     Mumbai      India     45     50        90.0
d  Neelu   32  Bangalore      India     33     50        66.0
e   John   16   New York         US     22     50        44.0
f   Mike   17  las vegas         US     11     50        22.0

It added a new column ‘Percentage‘ , where each entry contains the percentage of that student, which was calculated based on Marks & Total column values for that index.

Append column to dataFrame using assign() function

In Python, Pandas Library provides a function to add columns i.e.

DataFrame.assign(**kwargs)

It accepts a keyword & value pairs, where a keyword is column name and value is either list / series or a callable entry. It returns a new dataframe and doesn’t modify the current dataframe.

Let’s add columns in DataFrame using assign().

First of all reset dataframe i.e.

# Create a DataFrame object 
df_obj = pd.DataFrame(students,
 columns=['Name', 'Age', 'City', 'Country'],
 index=['a', 'b', 'c', 'd', 'e', 'f'])

Contents dataframe df_obj are,

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Add new column to DataFrame in Pandas using assign()

Let’s add a column ‘Marks’ i.e.

mod_fd = df_obj.assign(  Marks=[10, 20, 45, 33, 22, 11])

mod_fd

It will return a new dataframe with a new column ‘Marks’ in that Dataframe. Values provided in list will used as column values.

Contents of new dataframe mod_fd are,

    Name  Age       City    Country  Marks
a   jack   34     Sydeny  Australia     10
b   Riti   30      Delhi      India     20
c  Vikas   31     Mumbai      India     45
d  Neelu   32  Bangalore      India     33
e   John   16   New York         US     22
f   Mike   17  las vegas         US     11

Add multiple columns in DataFrame using assign()

We can also add multiple columns using assign() i.e.

df_obj = df_obj.assign(Marks=[10, 20, 45, 33, 22, 11], Total=[50] * 6)

It added both column Marks & Total. Contents of the returned dataframe is,

    Name  Age       City    Country  Marks  Total
a   jack   34     Sydeny  Australia     10     50
b   Riti   30      Delhi      India     20     50
c  Vikas   31     Mumbai      India     45     50
d  Neelu   32  Bangalore      India     33     50
e   John   16   New York         US     22     50
f   Mike   17  las vegas         US     11     50

Add a columns in DataFrame based on other column using lambda function

Add column ‘Percentage’ in dataframe, it’s each value will be calculated based on other columns in each row i.e.

df_obj = df_obj.assign(Percentage=lambda x: (x['Marks'] / x['Total']) * 100)

Contents of the returned dataframe are,

    Name  Age       City    Country  Marks  Total  Percentage
a   jack   34     Sydeny  Australia     10     50        20.0
b   Riti   30      Delhi      India     20     50        40.0
c  Vikas   31     Mumbai      India     45     50        90.0
d  Neelu   32  Bangalore      India     33     50        66.0
e   John   16   New York         US     22     50        44.0
f   Mike   17  las vegas         US     11     50        22.0

Pandas: Insert new column to Dataframe using insert()

First of all reset dataframe i.e.

# Create a DataFrame object 
df_obj = pd.DataFrame(students,
 columns=['Name', 'Age', 'City', 'Country'],
 index=['a', 'b', 'c', 'd', 'e', 'f'])

Contents dataframe df_obj are,

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

In all the previous solution, we added new column at the end of the dataframe, but suppose we want to add or insert a new column in between the other columns of the dataframe, then we can use the insert() function i.e.

# Insert column at the 2nd position of Dataframe
df_obj.insert(2, "Marks", [10, 20, 45, 33, 22, 11], True)

df_obj

Output:

    Name  Age  Marks       City    Country
a   jack   34     10     Sydeny  Australia
b   Riti   30     20      Delhi      India
c  Vikas   31     45     Mumbai      India
d  Neelu   32     33  Bangalore      India
e   John   16     22   New York         US
f   Mike   17     11  las vegas         US

It inserted the column ‘Marks’ in between other columns.

Pandas: Add a column to Dataframe by dictionary

Create a dictionary with keys as the values of new columns and values in dictionary will be the values of any existing column i.e.

ids = [11, 12, 13, 14, 15, 16]

# Provide 'ID' as the column name and for values provide dictionary
df_obj['ID'] = dict(zip(ids, df_obj['Name']))

df_obj

Output:

    Name  Age  Marks       City    Country  ID
a   jack   34     10     Sydeny  Australia  11
b   Riti   30     20      Delhi      India  12
c  Vikas   31     45     Mumbai      India  13
d  Neelu   32     33  Bangalore      India  14
e   John   16     22   New York         US  15
f   Mike   17     11  las vegas         US  16

Here we created a dictionary by zipping the a list of values and existing column ‘Name’. Then set this dictionary as the new column ‘ID’ inĀ  the dataframe.