In this article, we will discuss different ways to how to add a new column to dataframe in pandas i.e. using operator [] or assign() function or insert() function or using a dictionary. We will also discuss adding a new column by populating values from a list, using the same value in all indices, or calculating value on a new column based on another column.
Table of Contents
- Add column to Pandas Dataframe using [] operator
- Append column to Dataframe using assign() function
- Add multiple columns in DataFrame
- Add columns to DataFrame using Lambda Function
- Insert column to Dataframe using insert()
- Add column to Dataframe using dictionary
Let’s create a Dataframe object i.e.
import pandas as pd # List of Tuples students = [('jack', 34, 'Sydeny', 'Australia'), ('Riti', 30, 'Delhi', 'India'), ('Vikas', 31, 'Mumbai', 'India'), ('Neelu', 32, 'Bangalore', 'India'), ('John', 16, 'New York', 'US'), ('Mike', 17, 'las vegas', 'US')] # Create a DataFrame object df_obj = pd.DataFrame(students, columns=['Name', 'Age', 'City', 'Country'], index=['a', 'b', 'c', 'd', 'e', 'f']) print(df_obj)
Contents of the dataframe dfobj are,
Name Age City Country a jack 34 Sydeny Australia b Riti 30 Delhi India c Vikas 31 Mumbai India d Neelu 32 Bangalore India e John 16 New York US f Mike 17 las vegas US
Now lets discuss different ways to add new columns to this data frame in pandas.
Add column to Pandas Dataframe using [] operator
Pandas: Add Column from List
Suppose we want to add a new column ‘Marks’ with default values from a list. Let’s see how to do this,
# Add column with Name Marks df_obj['Marks'] = [10, 20, 45, 33, 22, 11] print(df_obj)
Output:
Frequently Asked:
- Python Pandas : How to get column and row names in DataFrame
- Add Row to Dataframe in Pandas
- Python Pandas : Replace or change Column & Row index names in DataFrame
- Convert List to DataFrame in Python
Name Age City Country Marks a jack 34 Sydeny Australia 10 b Riti 30 Delhi India 20 c Vikas 31 Mumbai India 45 d Neelu 32 Bangalore India 33 e John 16 New York US 22 f Mike 17 las vegas US 11
As dataframe df_obj didn’t had any column with name ‘Marks’ , so it added a new column in this dataframe.
But we need to keep these things in mind i.e.
- If values provided in list are less than number of indexes then it will give ValueError.
- If Column already exists then it will replace all its values.
Pandas: Add column to DataFrame with same value
Now add a new column ‘Total’ with same value 50 in each index i.e each item in this column will have same default value 50,
# Add column with same default value df_obj['Total'] = 50 print(df_obj)
Output
Name Age City Country Marks Total a jack 34 Sydeny Australia 10 50 b Riti 30 Delhi India 20 50 c Vikas 31 Mumbai India 45 50 d Neelu 32 Bangalore India 33 50 e John 16 New York US 22 50 f Mike 17 las vegas US 11 50
It added a new column ‘Total‘ and set value 50 at each items in that column.
Pandas: Add column based on another column
Let’s add a new column ‘Percentage‘ where entry at each index will be calculated by the values in other columns at that index i.e.
# Add column to Dataframe based on another column df_obj['Percentage'] = (df_obj['Marks'] / df_obj['Total']) * 100 print(df_obj)
Output:
Name Age City Country Marks Total Percentage a jack 34 Sydeny Australia 10 50 20.0 b Riti 30 Delhi India 20 50 40.0 c Vikas 31 Mumbai India 45 50 90.0 d Neelu 32 Bangalore India 33 50 66.0 e John 16 New York US 22 50 44.0 f Mike 17 las vegas US 11 50 22.0
It added a new column ‘Percentage‘ , where each entry contains the percentage of that student, which was calculated based on Marks & Total column values for that index.
Append column to dataFrame using assign() function
In Python, Pandas Library provides a function to add columns i.e.
DataFrame.assign(**kwargs)
It accepts a keyword & value pairs, where a keyword is column name and value is either list / series or a callable entry. It returns a new dataframe and doesn’t modify the current dataframe.
Let’s add columns in DataFrame using assign().
First of all reset dataframe i.e.
import pandas as pd # List of Tuples students = [('jack', 34, 'Sydeny', 'Australia'), ('Riti', 30, 'Delhi', 'India'), ('Vikas', 31, 'Mumbai', 'India'), ('Neelu', 32, 'Bangalore', 'India'), ('John', 16, 'New York', 'US'), ('Mike', 17, 'las vegas', 'US')] # Create a DataFrame object df_obj = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Country'], index=['a', 'b', 'c', 'd', 'e', 'f']) print(df_obj)
Contents dataframe df_obj are,
Name Age City Country a jack 34 Sydeny Australia b Riti 30 Delhi India c Vikas 31 Mumbai India d Neelu 32 Bangalore India e John 16 New York US f Mike 17 las vegas US
Add column to DataFrame in Pandas using assign()
Let’s add a column ‘Marks’ i.e.
# Add new column to DataFrame in Pandas using assign() mod_fd = df_obj.assign( Marks=[10, 20, 45, 33, 22, 11]) print(mod_fd)
It will return a new dataframe with a new column ‘Marks’ in that Dataframe. Values provided in list will used as column values.
Contents of new dataframe mod_fd are,
Name Age City Country Marks a jack 34 Sydeny Australia 10 b Riti 30 Delhi India 20 c Vikas 31 Mumbai India 45 d Neelu 32 Bangalore India 33 e John 16 New York US 22 f Mike 17 las vegas US 11
Add multiple columns in DataFrame using assign()
We can also add multiple columns using assign() i.e.
# Add two columns in the Dataframe df_obj = df_obj.assign( Marks=[10, 20, 45, 33, 22, 11], Total=[50] * 6) print(df_obj)
It added both column Marks & Total. Contents of the returned dataframe is,
Name Age City Country Marks Total a jack 34 Sydeny Australia 10 50 b Riti 30 Delhi India 20 50 c Vikas 31 Mumbai India 45 50 d Neelu 32 Bangalore India 33 50 e John 16 New York US 22 50 f Mike 17 las vegas US 11 50
Add a columns in DataFrame based on other column using lambda function
Add column ‘Percentage’ in dataframe, it’s each value will be calculated based on other columns in each row i.e.
# Add a column Percentage based on columns Marks & Total df_obj = df_obj.assign(Percentage = lambda x: (x['Marks'] / x['Total']) * 100) print(df_obj)
Contents of the returned dataframe are,
Name Age City Country Marks Total Percentage a jack 34 Sydeny Australia 10 50 20.0 b Riti 30 Delhi India 20 50 40.0 c Vikas 31 Mumbai India 45 50 90.0 d Neelu 32 Bangalore India 33 50 66.0 e John 16 New York US 22 50 44.0 f Mike 17 las vegas US 11 50 22.0
Pandas: Insert column to Dataframe using insert()
First of all reset dataframe i.e.
import pandas as pd # List of Tuples students = [('jack', 34, 'Sydeny', 'Australia'), ('Riti', 30, 'Delhi', 'India'), ('Vikas', 31, 'Mumbai', 'India'), ('Neelu', 32, 'Bangalore', 'India'), ('John', 16, 'New York', 'US'), ('Mike', 17, 'las vegas', 'US')] # Create a DataFrame object df_obj = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Country'], index=['a', 'b', 'c', 'd', 'e', 'f']) print(df_obj)
Contents dataframe df_obj are,
Name Age City Country a jack 34 Sydeny Australia b Riti 30 Delhi India c Vikas 31 Mumbai India d Neelu 32 Bangalore India e John 16 New York US f Mike 17 las vegas US
In all the previous solution, we added new column at the end of the dataframe, but suppose we want to add or insert a new column in between the other columns of the dataframe, then we can use the insert() function i.e.
# Insert column at the 2nd position of Dataframe df_obj.insert(2, # column position "Marks", # column name [10, 20, 45, 33, 22, 11], # column values True) # Allow duplicates print(df_obj)
Output:
Name Age Marks City Country a jack 34 10 Sydeny Australia b Riti 30 20 Delhi India c Vikas 31 45 Mumbai India d Neelu 32 33 Bangalore India e John 16 22 New York US f Mike 17 11 las vegas US
It inserted the column ‘Marks’ in between other columns.
Pandas: Add a column to Dataframe using dictionary
Create a dictionary with keys as the values of new columns and values in dictionary will be the values of any existing column i.e.
ids = [11, 12, 13, 14, 15, 16] # Provide 'ID' as the column name and for values provide dictionary df_obj['ID'] = dict(zip(ids, df_obj['Name'])) print(df_obj)
Output:
Name Age Marks City Country ID a jack 34 10 Sydeny Australia 11 b Riti 30 20 Delhi India 12 c Vikas 31 45 Mumbai India 13 d Neelu 32 33 Bangalore India 14 e John 16 22 New York US 15 f Mike 17 11 las vegas US 16
Here we created a dictionary by zipping the a list of values and existing column ‘Name’. Then set this dictionary as the new column ‘ID’ in the dataframe.
Thank you so much for such a powerful blog. This site has taught me so much with pandas and helped me understand the practical applications of certain functions more than any site.
Thanks for taking time to develop such a rich site.