Pandas: Add Column to Dataframe

In this article, we will discuss different ways to how to add a new column to dataframe in pandas i.e. using operator [] or assign() function or insert() function or using a dictionary. We will also discuss adding a new column by populating values from a list, using the same value in all indices, or calculating value on a new column based on another column.

Table of Contents

Let’s create a Dataframe object i.e.

import pandas as pd

# List of Tuples
students = [('jack', 34, 'Sydeny', 'Australia'),
            ('Riti', 30, 'Delhi', 'India'),
            ('Vikas', 31, 'Mumbai', 'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John', 16, 'New York', 'US'),
            ('Mike', 17, 'las vegas', 'US')]

# Create a DataFrame object
df_obj = pd.DataFrame(students,
                      columns=['Name', 'Age', 'City', 'Country'],
                      index=['a', 'b', 'c', 'd', 'e', 'f'])

print(df_obj)

Contents of the dataframe dfobj are,

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Now lets discuss different ways to add new columns to this data frame in pandas.

Advertisements

Add column to Pandas Dataframe using [] operator

Pandas: Add Column from List

Suppose we want to add a new column ‘Marks’ with default values from a list. Let’s see how to do this,

# Add column with Name Marks
df_obj['Marks'] = [10, 20, 45, 33, 22, 11]

print(df_obj)

Output:

    Name  Age       City    Country  Marks
a   jack   34     Sydeny  Australia     10
b   Riti   30      Delhi      India     20
c  Vikas   31     Mumbai      India     45
d  Neelu   32  Bangalore      India     33
e   John   16   New York         US     22
f   Mike   17  las vegas         US     11

As dataframe df_obj didn’t had any column with name ‘Marks’ , so it added a new column in this dataframe.

But we need to keep these things in mind i.e.

  • If values provided in list are less than number of indexes then it will give ValueError.
  • If Column already exists then it will replace all its values.

Pandas: Add column to DataFrame with same value

Now add a new column ‘Total’ with same value 50 in each index i.e each item in this column will have same default value 50,

# Add column with same default value
df_obj['Total'] = 50

print(df_obj)

Output

    Name  Age       City    Country  Marks  Total
a   jack   34     Sydeny  Australia     10     50
b   Riti   30      Delhi      India     20     50
c  Vikas   31     Mumbai      India     45     50
d  Neelu   32  Bangalore      India     33     50
e   John   16   New York         US     22     50
f   Mike   17  las vegas         US     11     50

It added a new column ‘Total‘ and set value 50 at each items in that column.

Pandas: Add column based on another column

Let’s add a new column ‘Percentage‘ where entry at each index will be calculated by the values in other columns at that index i.e.

# Add column to Dataframe based on another column
df_obj['Percentage'] = (df_obj['Marks'] / df_obj['Total']) * 100

print(df_obj)

Output:

    Name  Age       City    Country  Marks  Total  Percentage
a   jack   34     Sydeny  Australia     10     50        20.0
b   Riti   30      Delhi      India     20     50        40.0
c  Vikas   31     Mumbai      India     45     50        90.0
d  Neelu   32  Bangalore      India     33     50        66.0
e   John   16   New York         US     22     50        44.0
f   Mike   17  las vegas         US     11     50        22.0

It added a new column ‘Percentage‘ , where each entry contains the percentage of that student, which was calculated based on Marks & Total column values for that index.

Append column to dataFrame using assign() function

In Python, Pandas Library provides a function to add columns i.e.

DataFrame.assign(**kwargs)

It accepts a keyword & value pairs, where a keyword is column name and value is either list / series or a callable entry. It returns a new dataframe and doesn’t modify the current dataframe.

Let’s add columns in DataFrame using assign().

First of all reset dataframe i.e.

import pandas as pd

# List of Tuples
students = [('jack', 34, 'Sydeny', 'Australia'),
            ('Riti', 30, 'Delhi', 'India'),
            ('Vikas', 31, 'Mumbai', 'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John', 16, 'New York', 'US'),
            ('Mike', 17, 'las vegas', 'US')]

# Create a DataFrame object 
df_obj = pd.DataFrame(  students,
                        columns=['Name', 'Age', 'City', 'Country'],
                        index=['a', 'b', 'c', 'd', 'e', 'f'])

print(df_obj)

Contents dataframe df_obj are,

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Add column to DataFrame in Pandas using assign()

Let’s add a column ‘Marks’ i.e.

# Add new column to DataFrame in Pandas using assign()
mod_fd = df_obj.assign(  Marks=[10, 20, 45, 33, 22, 11])

print(mod_fd)

It will return a new dataframe with a new column ‘Marks’ in that Dataframe. Values provided in list will used as column values.

Contents of new dataframe mod_fd are,

    Name  Age       City    Country  Marks
a   jack   34     Sydeny  Australia     10
b   Riti   30      Delhi      India     20
c  Vikas   31     Mumbai      India     45
d  Neelu   32  Bangalore      India     33
e   John   16   New York         US     22
f   Mike   17  las vegas         US     11

Add multiple columns in DataFrame using assign()

We can also add multiple columns using assign() i.e.

# Add two columns in the Dataframe
df_obj = df_obj.assign( Marks=[10, 20, 45, 33, 22, 11],
                        Total=[50] * 6)

print(df_obj)

It added both column Marks & Total. Contents of the returned dataframe is,

    Name  Age       City    Country  Marks  Total
a   jack   34     Sydeny  Australia     10     50
b   Riti   30      Delhi      India     20     50
c  Vikas   31     Mumbai      India     45     50
d  Neelu   32  Bangalore      India     33     50
e   John   16   New York         US     22     50
f   Mike   17  las vegas         US     11     50

Add a columns in DataFrame based on other column using lambda function

Add column ‘Percentage’ in dataframe, it’s each value will be calculated based on other columns in each row i.e.

# Add a column Percentage based on columns Marks & Total
df_obj = df_obj.assign(Percentage = lambda x: (x['Marks'] / x['Total']) * 100)

print(df_obj)

Contents of the returned dataframe are,

    Name  Age       City    Country  Marks  Total  Percentage
a   jack   34     Sydeny  Australia     10     50        20.0
b   Riti   30      Delhi      India     20     50        40.0
c  Vikas   31     Mumbai      India     45     50        90.0
d  Neelu   32  Bangalore      India     33     50        66.0
e   John   16   New York         US     22     50        44.0
f   Mike   17  las vegas         US     11     50        22.0

Pandas: Insert column to Dataframe using insert()

First of all reset dataframe i.e.

import pandas as pd

# List of Tuples
students = [('jack', 34, 'Sydeny', 'Australia'),
            ('Riti', 30, 'Delhi', 'India'),
            ('Vikas', 31, 'Mumbai', 'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John', 16, 'New York', 'US'),
            ('Mike', 17, 'las vegas', 'US')]

# Create a DataFrame object 
df_obj = pd.DataFrame(  students,
                        columns=['Name', 'Age', 'City', 'Country'],
                        index=['a', 'b', 'c', 'd', 'e', 'f'])

print(df_obj)

Contents dataframe df_obj are,

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

In all the previous solution, we added new column at the end of the dataframe, but suppose we want to add or insert a new column in between the other columns of the dataframe, then we can use the insert() function i.e.

# Insert column at the 2nd position of Dataframe
df_obj.insert(2,                        # column position
             "Marks",                   # column name
             [10, 20, 45, 33, 22, 11],  # column values
             True)                      # Allow duplicates

print(df_obj)

Output:

    Name  Age  Marks       City    Country
a   jack   34     10     Sydeny  Australia
b   Riti   30     20      Delhi      India
c  Vikas   31     45     Mumbai      India
d  Neelu   32     33  Bangalore      India
e   John   16     22   New York         US
f   Mike   17     11  las vegas         US

It inserted the column ‘Marks’ in between other columns.

Pandas: Add a column to Dataframe using dictionary

Create a dictionary with keys as the values of new columns and values in dictionary will be the values of any existing column i.e.

ids = [11, 12, 13, 14, 15, 16]

# Provide 'ID' as the column name and for values provide dictionary
df_obj['ID'] = dict(zip(ids, df_obj['Name']))

print(df_obj)

Output:

    Name  Age  Marks       City    Country  ID
a   jack   34     10     Sydeny  Australia  11
b   Riti   30     20      Delhi      India  12
c  Vikas   31     45     Mumbai      India  13
d  Neelu   32     33  Bangalore      India  14
e   John   16     22   New York         US  15
f   Mike   17     11  las vegas         US  16

Here we created a dictionary by zipping the a list of values and existing column ‘Name’. Then set this dictionary as the new column ‘ID’ in  the dataframe.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

1 thought on “Pandas: Add Column to Dataframe”

  1. Thank you so much for such a powerful blog. This site has taught me so much with pandas and helped me understand the practical applications of certain functions more than any site.

    Thanks for taking time to develop such a rich site.

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top