Add a new column to an existing DataFrame in Pandas

We often encounter scenarios in which we either need to add some information in the same DataFrame. In this article, we will discuss different ways to achieve that.

Table of Contents

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

Contents of the created dataframe are,

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

Now, let’s look at different ways in which we could add a new column in this DataFrame.

Advertisements

Add new Column in DataFrame using direct assignment

This is the simplest way to add a new column in the existing DataFrame, we could basically add a new column with a constant value or from some predefined values. For instance, let’s try to add a new column with a constant value.

# adding a column with a constant value
df['Company'] = 'thisPointer'

print (df)

Output

      Name       Designation      City  Experience      Company
0  Shubham    Data Scientist    Sydney           5  thisPointer
1     Riti      Data Analyst     Delhi           7  thisPointer
2   Shanky   Program Manager     Delhi           2  thisPointer
3   Shreya  Graphic Designer    Mumbai           2  thisPointer
4     Aadi  Data Engineering  New York          11  thisPointer

We can also add a new column that contains some specific values as below.

# adding a column with specific values
country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'])
df['Country'] = country

print (df)

Output

      Name       Designation      City  Experience    Country
0  Shubham    Data Scientist    Sydney           5  Australia
1     Riti      Data Analyst     Delhi           7      India
2   Shanky   Program Manager     Delhi           2      India
3   Shreya  Graphic Designer    Mumbai           2      India
4     Aadi  Data Engineering  New York          11        USA

Please note that the index of the series (or any other data structure) should match the DataFrame indexes, otherwise, it might result in NaNs as shown below.

# adding a column with specific values
country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index = [3,4,5,6,7])
df['Country'] = country

print (df)

Output

      Name       Designation      City  Experience    Country
0  Shubham    Data Scientist    Sydney           5        NaN
1     Riti      Data Analyst     Delhi           7        NaN
2   Shanky   Program Manager     Delhi           2        NaN
3   Shreya  Graphic Designer    Mumbai           2  Australia
4     Aadi  Data Engineering  New York          11      India

The complete example is as follows,

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# adding a column with a constant value
df['Company'] = 'thisPointer'

print (df)

# adding a column with specific values
country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'])
df['Country'] = country

print (df)

# adding a column with specific values
country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'],
                    index = [3,4,5,6,7])

df['Country'] = country

print (df)

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

      Name       Designation      City  Experience      Company
0  Shubham    Data Scientist    Sydney           5  thisPointer
1     Riti      Data Analyst     Delhi           7  thisPointer
2   Shanky   Program Manager     Delhi           2  thisPointer
3   Shreya  Graphic Designer    Mumbai           2  thisPointer
4     Aadi  Data Engineering  New York          11  thisPointer

      Name       Designation      City  Experience      Company    Country
0  Shubham    Data Scientist    Sydney           5  thisPointer  Australia
1     Riti      Data Analyst     Delhi           7  thisPointer      India
2   Shanky   Program Manager     Delhi           2  thisPointer      India
3   Shreya  Graphic Designer    Mumbai           2  thisPointer      India
4     Aadi  Data Engineering  New York          11  thisPointer        USA

      Name       Designation      City  Experience      Company    Country
0  Shubham    Data Scientist    Sydney           5  thisPointer        NaN
1     Riti      Data Analyst     Delhi           7  thisPointer        NaN
2   Shanky   Program Manager     Delhi           2  thisPointer        NaN
3   Shreya  Graphic Designer    Mumbai           2  thisPointer  Australia
4     Aadi  Data Engineering  New York          11  thisPointer      India

Add multiple columns to DataFrame using assign() function

The assign() function comes in handy whenever you need to add multiple columns while ignoring the index issue that we saw in the above method. Let’s try to add two columns with different indexes using the assign operator.

# using the assign function
country1 = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4])
country2 = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [3,4,5,6,7])

print (df.assign(Country1 = country1.values, Country2 = country2.values))

Output

      Name       Designation      City  Experience   Country1   Country2
0  Shubham    Data Scientist    Sydney           5  Australia  Australia
1     Riti      Data Analyst     Delhi           7      India      India
2   Shanky   Program Manager     Delhi           2      India      India
3   Shreya  Graphic Designer    Mumbai           2      India      India
4     Aadi  Data Engineering  New York          11        USA        USA

Hence, using the assign operator doesn’t result in NaN values. We could also use the assign method to overwrite any existing column.

# overwrite existing column using the assign() function
City = pd.Series(['Bangalore', 'Delhi', 'Delhi', 'Mumbai', 'Seattle'], index= [0,1,2,3,4])

print (df.assign(City = City.values))

Output

      Name       Designation       City  Experience
0  Shubham    Data Scientist  Bangalore           5
1     Riti      Data Analyst      Delhi           7
2   Shanky   Program Manager      Delhi           2
3   Shreya  Graphic Designer     Mumbai           2
4     Aadi  Data Engineering    Seattle          11

However, we need to be a little cautious while using the assign operator as it could update an existing column as well (in case we didn’t intend to do the same).

The complete example is as follows

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# using the assign function
country1 = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4])
country2 = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [3,4,5,6,7])

print (df.assign(Country1 = country1.values, Country2 = country2.values))

# overwrite existing column using the assign() function
City = pd.Series(['Bangalore', 'Delhi', 'Delhi', 'Mumbai', 'Seattle'], index= [0,1,2,3,4])

print (df.assign(City = City.values))

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

      Name       Designation      City  Experience   Country1   Country2
0  Shubham    Data Scientist    Sydney           5  Australia  Australia
1     Riti      Data Analyst     Delhi           7      India      India
2   Shanky   Program Manager     Delhi           2      India      India
3   Shreya  Graphic Designer    Mumbai           2      India      India
4     Aadi  Data Engineering  New York          11        USA        USA

      Name       Designation       City  Experience
0  Shubham    Data Scientist  Bangalore           5
1     Riti      Data Analyst      Delhi           7
2   Shanky   Program Manager      Delhi           2
3   Shreya  Graphic Designer     Mumbai           2
4     Aadi  Data Engineering    Seattle          11

Insert new Column in DataFrame using insert() function

As the name suggests, the insert() method is mainly used to insert a new column at a specific place in the DataFrame. The index method takes three arguments –

1) Column index where we want to place our new column
2) Column name
3) Column values

For example, we need to insert the Country column right next to the City column.

# using the insert function
country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4])
df.insert(3, 'Country', country.values)

print (df)

Output

      Name       Designation      City    Country  Experience
0  Shubham    Data Scientist    Sydney  Australia           5
1     Riti      Data Analyst     Delhi      India           7
2   Shanky   Program Manager     Delhi      India           2
3   Shreya  Graphic Designer    Mumbai      India           2
4     Aadi  Data Engineering  New York        USA          11

In case we try to add a new column with a column name already existing in the DataFrame, it would result in a ValueError.

# using the insert function
city = pd.Series(['Bangalore', 'Delhi', 'Delhi', 'Mumbai', 'Seattle'], index= [0,1,2,3,4])

df.insert(3, 'City', city.values)

print (df)

Output

ValueError: cannot insert City, already exists

To insert a duplicate column with the same name, we need to pass an additional argument “allow_duplicates” as True.

# using the insert function
city = pd.Series(['Bangalore', 'Delhi', 'Delhi', 'Mumbai', 'Seattle'], index= [0,1,2,3,4])
df.insert(3, 'City', country.values, allow_duplicates = True)

print (df)

Output

      Name       Designation      City       City  Experience
0  Shubham    Data Scientist    Sydney  Bangalore           5
1     Riti      Data Analyst     Delhi      Delhi           7
2   Shanky   Program Manager     Delhi      Delhi           2
3   Shreya  Graphic Designer    Mumbai     Mumbai           2
4     Aadi  Data Engineering  New York    Seattle          11

The complete example is as follows,

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# using the insert function
country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4])
df.insert(3, 'Country', country.values)

print (df)

# using the insert function
city = pd.Series(['Bangalore', 'Delhi', 'Delhi', 'Mumbai', 'Seattle'], index= [0,1,2,3,4])
df.insert(3, 'City', country.values, allow_duplicates = True)

print (df)

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

      Name       Designation      City    Country  Experience
0  Shubham    Data Scientist    Sydney  Australia           5
1     Riti      Data Analyst     Delhi      India           7
2   Shanky   Program Manager     Delhi      India           2
3   Shreya  Graphic Designer    Mumbai      India           2
4     Aadi  Data Engineering  New York        USA          11

      Name       Designation      City       City    Country  Experience
0  Shubham    Data Scientist    Sydney  Australia  Australia           5
1     Riti      Data Analyst     Delhi      India      India           7
2   Shanky   Program Manager     Delhi      India      India           2
3   Shreya  Graphic Designer    Mumbai      India      India           2
4     Aadi  Data Engineering  New York        USA        USA          11

Add new Column to DataFrame using concat() method

We can add new columns using the concat() method, although, it is generally more used for concatenating two or multiple DataFrames. For now, let’s try to add a new column using the concat method.

# using the concat function
country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4])

df = pd.concat([df, country.rename("Country")], axis=1)

print (df)

Output

      Name       Designation      City  Experience    Country
0  Shubham    Data Scientist    Sydney           5  Australia
1     Riti      Data Analyst     Delhi           7      India
2   Shanky   Program Manager     Delhi           2      India
3   Shreya  Graphic Designer    Mumbai           2      India
4     Aadi  Data Engineering  New York          11        USA

Here, we need to care of the indices as it could create a output with all the indices present in both the objects.

# using the concat function
country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [3,4,5,6,7])

df = pd.concat([df, country.rename("Country")], axis=1)

print (df)

Output

      Name       Designation      City  Experience    Country
0  Shubham    Data Scientist    Sydney         5.0        NaN
1     Riti      Data Analyst     Delhi         7.0        NaN
2   Shanky   Program Manager     Delhi         2.0        NaN
3   Shreya  Graphic Designer    Mumbai         2.0  Australia
4     Aadi  Data Engineering  New York        11.0      India
5      NaN               NaN       NaN         NaN      India
6      NaN               NaN       NaN         NaN      India
7      NaN               NaN       NaN         NaN        USA

The complete example is as follows,

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# using the concat function
country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4])

df = pd.concat([df, country.rename("Country")], axis=1)

print (df)

# using the concat function
country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [3,4,5,6,7])

df = pd.concat([df, country.rename("Country")], axis=1)

print (df)

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

      Name       Designation      City  Experience    Country
0  Shubham    Data Scientist    Sydney           5  Australia
1     Riti      Data Analyst     Delhi           7      India
2   Shanky   Program Manager     Delhi           2      India
3   Shreya  Graphic Designer    Mumbai           2      India
4     Aadi  Data Engineering  New York          11        USA

      Name       Designation      City  Experience    Country    Country
0  Shubham    Data Scientist    Sydney         5.0  Australia        NaN
1     Riti      Data Analyst     Delhi         7.0      India        NaN
2   Shanky   Program Manager     Delhi         2.0      India        NaN
3   Shreya  Graphic Designer    Mumbai         2.0      India  Australia
4     Aadi  Data Engineering  New York        11.0        USA      India
5      NaN               NaN       NaN         NaN        NaN      India
6      NaN               NaN       NaN         NaN        NaN      India
7      NaN               NaN       NaN         NaN        NaN        USA

Summary

Great, you made it! In this article, we have discussed multiple ways to add a new column in the pandas DataFrame. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top