We often encounter scenarios in which we either need to add some information in the same DataFrame. In this article, we will discuss different ways to achieve that.
Table of Contents
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df)
Contents of the created dataframe are,
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11
Now, let’s look at different ways in which we could add a new column in this DataFrame.
Add new Column in DataFrame using direct assignment
This is the simplest way to add a new column in the existing DataFrame, we could basically add a new column with a constant value or from some predefined values. For instance, let’s try to add a new column with a constant value.
# adding a column with a constant value df['Company'] = 'thisPointer' print (df)
Output
Name Designation City Experience Company 0 Shubham Data Scientist Sydney 5 thisPointer 1 Riti Data Analyst Delhi 7 thisPointer 2 Shanky Program Manager Delhi 2 thisPointer 3 Shreya Graphic Designer Mumbai 2 thisPointer 4 Aadi Data Engineering New York 11 thisPointer
We can also add a new column that contains some specific values as below.
# adding a column with specific values country = pd.Series(['Australia', 'India', 'India', 'India', 'USA']) df['Country'] = country print (df)
Output
Name Designation City Experience Country 0 Shubham Data Scientist Sydney 5 Australia 1 Riti Data Analyst Delhi 7 India 2 Shanky Program Manager Delhi 2 India 3 Shreya Graphic Designer Mumbai 2 India 4 Aadi Data Engineering New York 11 USA
Please note that the index of the series (or any other data structure) should match the DataFrame indexes, otherwise, it might result in NaNs as shown below.
# adding a column with specific values country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index = [3,4,5,6,7]) df['Country'] = country print (df)
Output
Name Designation City Experience Country 0 Shubham Data Scientist Sydney 5 NaN 1 Riti Data Analyst Delhi 7 NaN 2 Shanky Program Manager Delhi 2 NaN 3 Shreya Graphic Designer Mumbai 2 Australia 4 Aadi Data Engineering New York 11 India
The complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # adding a column with a constant value df['Company'] = 'thisPointer' print (df) # adding a column with specific values country = pd.Series(['Australia', 'India', 'India', 'India', 'USA']) df['Country'] = country print (df) # adding a column with specific values country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index = [3,4,5,6,7]) df['Country'] = country print (df)
Output:
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 Name Designation City Experience Company 0 Shubham Data Scientist Sydney 5 thisPointer 1 Riti Data Analyst Delhi 7 thisPointer 2 Shanky Program Manager Delhi 2 thisPointer 3 Shreya Graphic Designer Mumbai 2 thisPointer 4 Aadi Data Engineering New York 11 thisPointer Name Designation City Experience Company Country 0 Shubham Data Scientist Sydney 5 thisPointer Australia 1 Riti Data Analyst Delhi 7 thisPointer India 2 Shanky Program Manager Delhi 2 thisPointer India 3 Shreya Graphic Designer Mumbai 2 thisPointer India 4 Aadi Data Engineering New York 11 thisPointer USA Name Designation City Experience Company Country 0 Shubham Data Scientist Sydney 5 thisPointer NaN 1 Riti Data Analyst Delhi 7 thisPointer NaN 2 Shanky Program Manager Delhi 2 thisPointer NaN 3 Shreya Graphic Designer Mumbai 2 thisPointer Australia 4 Aadi Data Engineering New York 11 thisPointer India
Add multiple columns to DataFrame using assign() function
The assign() function comes in handy whenever you need to add multiple columns while ignoring the index issue that we saw in the above method. Let’s try to add two columns with different indexes using the assign operator.
# using the assign function country1 = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4]) country2 = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [3,4,5,6,7]) print (df.assign(Country1 = country1.values, Country2 = country2.values))
Output
Name Designation City Experience Country1 Country2 0 Shubham Data Scientist Sydney 5 Australia Australia 1 Riti Data Analyst Delhi 7 India India 2 Shanky Program Manager Delhi 2 India India 3 Shreya Graphic Designer Mumbai 2 India India 4 Aadi Data Engineering New York 11 USA USA
Hence, using the assign operator doesn’t result in NaN values. We could also use the assign method to overwrite any existing column.
# overwrite existing column using the assign() function City = pd.Series(['Bangalore', 'Delhi', 'Delhi', 'Mumbai', 'Seattle'], index= [0,1,2,3,4]) print (df.assign(City = City.values))
Output
Name Designation City Experience 0 Shubham Data Scientist Bangalore 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering Seattle 11
However, we need to be a little cautious while using the assign operator as it could update an existing column as well (in case we didn’t intend to do the same).
The complete example is as follows
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # using the assign function country1 = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4]) country2 = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [3,4,5,6,7]) print (df.assign(Country1 = country1.values, Country2 = country2.values)) # overwrite existing column using the assign() function City = pd.Series(['Bangalore', 'Delhi', 'Delhi', 'Mumbai', 'Seattle'], index= [0,1,2,3,4]) print (df.assign(City = City.values))
Output:
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 Name Designation City Experience Country1 Country2 0 Shubham Data Scientist Sydney 5 Australia Australia 1 Riti Data Analyst Delhi 7 India India 2 Shanky Program Manager Delhi 2 India India 3 Shreya Graphic Designer Mumbai 2 India India 4 Aadi Data Engineering New York 11 USA USA Name Designation City Experience 0 Shubham Data Scientist Bangalore 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering Seattle 11
Insert new Column in DataFrame using insert() function
As the name suggests, the insert() method is mainly used to insert a new column at a specific place in the DataFrame. The index method takes three arguments –
1) Column index where we want to place our new column
2) Column name
3) Column values
For example, we need to insert the Country column right next to the City column.
# using the insert function country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4]) df.insert(3, 'Country', country.values) print (df)
Output
Name Designation City Country Experience 0 Shubham Data Scientist Sydney Australia 5 1 Riti Data Analyst Delhi India 7 2 Shanky Program Manager Delhi India 2 3 Shreya Graphic Designer Mumbai India 2 4 Aadi Data Engineering New York USA 11
In case we try to add a new column with a column name already existing in the DataFrame, it would result in a ValueError.
# using the insert function city = pd.Series(['Bangalore', 'Delhi', 'Delhi', 'Mumbai', 'Seattle'], index= [0,1,2,3,4]) df.insert(3, 'City', city.values) print (df)
Output
ValueError: cannot insert City, already exists
To insert a duplicate column with the same name, we need to pass an additional argument “allow_duplicates” as True.
# using the insert function city = pd.Series(['Bangalore', 'Delhi', 'Delhi', 'Mumbai', 'Seattle'], index= [0,1,2,3,4]) df.insert(3, 'City', country.values, allow_duplicates = True) print (df)
Output
Name Designation City City Experience 0 Shubham Data Scientist Sydney Bangalore 5 1 Riti Data Analyst Delhi Delhi 7 2 Shanky Program Manager Delhi Delhi 2 3 Shreya Graphic Designer Mumbai Mumbai 2 4 Aadi Data Engineering New York Seattle 11
The complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # using the insert function country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4]) df.insert(3, 'Country', country.values) print (df) # using the insert function city = pd.Series(['Bangalore', 'Delhi', 'Delhi', 'Mumbai', 'Seattle'], index= [0,1,2,3,4]) df.insert(3, 'City', country.values, allow_duplicates = True) print (df)
Output:
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 Name Designation City Country Experience 0 Shubham Data Scientist Sydney Australia 5 1 Riti Data Analyst Delhi India 7 2 Shanky Program Manager Delhi India 2 3 Shreya Graphic Designer Mumbai India 2 4 Aadi Data Engineering New York USA 11 Name Designation City City Country Experience 0 Shubham Data Scientist Sydney Australia Australia 5 1 Riti Data Analyst Delhi India India 7 2 Shanky Program Manager Delhi India India 2 3 Shreya Graphic Designer Mumbai India India 2 4 Aadi Data Engineering New York USA USA 11
Add new Column to DataFrame using concat() method
We can add new columns using the concat() method, although, it is generally more used for concatenating two or multiple DataFrames. For now, let’s try to add a new column using the concat method.
# using the concat function country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4]) df = pd.concat([df, country.rename("Country")], axis=1) print (df)
Output
Name Designation City Experience Country 0 Shubham Data Scientist Sydney 5 Australia 1 Riti Data Analyst Delhi 7 India 2 Shanky Program Manager Delhi 2 India 3 Shreya Graphic Designer Mumbai 2 India 4 Aadi Data Engineering New York 11 USA
Here, we need to care of the indices as it could create a output with all the indices present in both the objects.
# using the concat function country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [3,4,5,6,7]) df = pd.concat([df, country.rename("Country")], axis=1) print (df)
Output
Name Designation City Experience Country 0 Shubham Data Scientist Sydney 5.0 NaN 1 Riti Data Analyst Delhi 7.0 NaN 2 Shanky Program Manager Delhi 2.0 NaN 3 Shreya Graphic Designer Mumbai 2.0 Australia 4 Aadi Data Engineering New York 11.0 India 5 NaN NaN NaN NaN India 6 NaN NaN NaN NaN India 7 NaN NaN NaN NaN USA
The complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # using the concat function country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [0,1,2,3,4]) df = pd.concat([df, country.rename("Country")], axis=1) print (df) # using the concat function country = pd.Series(['Australia', 'India', 'India', 'India', 'USA'], index= [3,4,5,6,7]) df = pd.concat([df, country.rename("Country")], axis=1) print (df)
Output:
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 Name Designation City Experience Country 0 Shubham Data Scientist Sydney 5 Australia 1 Riti Data Analyst Delhi 7 India 2 Shanky Program Manager Delhi 2 India 3 Shreya Graphic Designer Mumbai 2 India 4 Aadi Data Engineering New York 11 USA Name Designation City Experience Country Country 0 Shubham Data Scientist Sydney 5.0 Australia NaN 1 Riti Data Analyst Delhi 7.0 India NaN 2 Shanky Program Manager Delhi 2.0 India NaN 3 Shreya Graphic Designer Mumbai 2.0 India Australia 4 Aadi Data Engineering New York 11.0 USA India 5 NaN NaN NaN NaN NaN India 6 NaN NaN NaN NaN NaN India 7 NaN NaN NaN NaN NaN USA
Summary
Great, you made it! In this article, we have discussed multiple ways to add a new column in the pandas DataFrame. Thanks.