In this article, we will discuss multiple ways to add an empty column to a pandas DataFrame.
Table of Contents
Preparing Dataset for solution
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees= [('Shubham', 'Data Scientist', 'Tech', 5), ('Riti', 'Data Engineer', 'Tech' , 7), ('Shanky', 'Program Manager', 'PMO' , 2), ('Shreya', 'Graphic Designer', 'Design' , 2), ('Aadi', 'Backend Developer', 'Tech', 11), ('Sim', 'Data Engineer', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'Team', 'Experience'], index=[0, 1, 2, 3, 4, 5]) print(df)
Contents of the created dataframe are,
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager PMO 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Tech 4
Using Assignment operator
The simplest way to add an empty column is using the assignment operator. Let’s look at the code below to understand better, here, we are creating a new column but passing nothing to create it as empty.
# Add a new empty column to the DataFrame df['new_col'] = None print (df)
Output
Name Designation Team Experience new_col 0 Shubham Data Scientist Tech 5 None 1 Riti Data Engineer Tech 7 None 2 Shanky Program Manager PMO 2 None 3 Shreya Graphic Designer Design 2 None 4 Aadi Backend Developer Tech 11 None 5 Sim Data Engineer Tech 4 None
Now the DataFrame contains a new column (new_col) containing no values. Alternatively, we can also assign it to numpy.NaN or empty Series (pandas.Series()) instead of “None” value.
Frequently Asked:
- Replace column values based on conditions in Pandas
- Extract specific columns to new DataFrame as copy in Pandas
- Select Rows where a column is null in Pandas
- Add a column with current datetime in Pandas DataFrame
Using assign() function
The DataFrame.assign() function is also used to create a new column in any existing DataFrame. Let’s again create a new column but use the assign function now.
# Add new compy column to DataFrame # using assign() function df = df.assign(new_col = pd.Series(dtype='int')) print (df)
Output
Name Designation Team Experience new_col 0 Shubham Data Scientist Tech 5 NaN 1 Riti Data Engineer Tech 7 NaN 2 Shanky Program Manager PMO 2 NaN 3 Shreya Graphic Designer Design 2 NaN 4 Aadi Backend Developer Tech 11 NaN 5 Sim Data Engineer Tech 4 NaN
The assign function takes the column name as the argument, assigned with their values. Note that, we could have also given None or numpy.NaN here instead of empty Series to create an empty column.
Using insert() function
The pandas.DataFrame.insert() is an important function to add a new column in any existing DataFrame. The advantage of using the insert() function is that we can also decide the location of the new column to be added. Let’s add the new empty column between the columns Designation and Team.
# Add empty column at fiven location # using insert() function df.insert(2, "new_col", np.NaN) print (df)
Output
Name Designation new_col Team Experience 0 Shubham Data Scientist NaN Tech 5 1 Riti Data Engineer NaN Tech 7 2 Shanky Program Manager NaN PMO 2 3 Shreya Graphic Designer NaN Design 2 4 Aadi Backend Developer NaN Tech 11 5 Sim Data Engineer NaN Tech 4
The function takes three arguments – column index or position where the new column needs to be added, new column name and their values.
Using reindex() function
The last method is using the reindex function, we can add additional column names in any existing DataFrame to create empty columns. Let’s look at the code below.
# Add empty column using reindex() function df = df.reindex(columns = df.columns.tolist()+ ['new_col']) print (df)
Output
Name Designation Team Experience new_col 0 Shubham Data Scientist Tech 5 NaN 1 Riti Data Engineer Tech 7 NaN 2 Shanky Program Manager PMO 2 NaN 3 Shreya Graphic Designer Design 2 NaN 4 Aadi Backend Developer Tech 11 NaN 5 Sim Data Engineer Tech 4 NaN
Here, we have reindexed the existing DataFrame with an additional column named “new_col” (we can add multiple columns as well in the list).
Summary
In this article, we have discussed multiple ways to add an empty column to a pandas DataFrame.