Pandas: Add two columns into a new column in Dataframe

In this article we will discuss different techniques to sum values of two columns in a dataframe and assign summed values as a new column. We will also cover the scenarios where columns contain NaN values.

Suppose we have a Dataframe which contains employee name as an index and each column contains their monthly salaries,

# List of Tuples
empSalary = [('jack', 2000, 2010, 2050, 2134, 2111),
             ('Riti', 3000, 3022, 3456, 3111, 2109),
             ('Aadi', 4022, np.NaN, 2077, 2134, 3122),
             ('Mohit', 3012, 3050, 2010, 2122, 1111),
             ('Veena', 2023, 2232, np.NaN, 2112, 1099),
             ('Shaun', 2123, 2510, 3050, 3134, 2122),
             ('Mark', 4000, 2000, 2050, 2122, 2111)
             ]

# Create a DataFrame object
df = pd.DataFrame(empSalary,
                  columns=['Name', 'Jan', 'Feb', 'March', 'April', 'May'])

df.set_index('Name',
             inplace=True)

print(df)

Output:

        Jan     Feb   March  April   May
Name                                    
jack   2000  2010.0  2050.0   2134  2111
Riti   3000  3022.0  3456.0   3111  2109
Aadi   4022     NaN  2077.0   2134  3122
Mohit  3012  3050.0  2010.0   2122  1111
Veena  2023  2232.0     NaN   2112  1099
Shaun  2123  2510.0  3050.0   3134  2122
Mark   4000  2000.0  2050.0   2122  2111

Now we want to add the values of two columns altogether and create a new column out of those summed values. Let’s see how to do that,

Pandas: Sum two columns together to make a new series

We can select individual columns by column names using [] operator and then we can add values in those columns using + operator. For example,

# Add two columns together to make a new series
total = df['Jan'] + df['May']

print(total)

Output:

Name
jack     4111
Riti     5109
Aadi     7144
Mohit    4123
Veena    3122
Shaun    4245
Mark     6111
dtype: int64

It returned a Series and each value in the series contains the sum of values from columns ‘Jan’ & ‘May’ for respective indexes.

Frequently Asked:

Add two columns to make a new column

In the previous example we got a Series by adding values of 2 columns. We can also set this new series object as a new column in the dataframe i.e.

# Add two columns to make a new column
df['Jan_May'] = df['Jan'] + df['May']

print('Updated DataFrame:')
print(df)

Output:

Updated DataFrame:
        Jan     Feb   March  April   May  Jan_May
Name                                             
jack   2000  2010.0  2050.0   2134  2111     4111
Riti   3000  3022.0  3456.0   3111  2109     5109
Aadi   4022     NaN  2077.0   2134  3122     7144
Mohit  3012  3050.0  2010.0   2122  1111     4123
Veena  2023  2232.0     NaN   2112  1099     3122
Shaun  2123  2510.0  3050.0   3134  2122     4245
Mark   4000  2000.0  2050.0   2122  2111     6111

Column ‘Jan_May’ contains the sum of values in column ‘Jan’ & column ‘May’.

Pandas: Sum two columns containing NaN values

Suppose we are adding the values of two columns and some entries in any of the columns are NaN, then in the final Series object values of those indexes will be NaN. For example, in our dataframe column ‘Feb’ has some NaN values.

Let’s try to add the column ‘Jan’ & ‘Feb’,

# pandas sum two columns with NaN
total = df['Jan'] + df['Feb']

print(total)

Output:

Name
jack     4010.0
Riti     6022.0
Aadi        NaN
Mohit    6062.0
Veena    4255.0
Shaun    4633.0
Mark     6000.0
dtype: float64

We got a Series object by adding values in column ‘Jan’ & ‘Feb’. But for indexes where column ‘Feb’ has NaNs, values, the Series object ‘total’ also have NaN values for those indexes.

This might not be the required behavior for some. If we want then we can replace NaN values in a column by 0 before adding them to other column’s values. This way we will not have any NaN in the returned Series object. For example,

# Replace NaN by 0 and then add values in two columns
total = df['Jan'] + df['Feb'].fillna(0)

print(total)

Output:

Name
jack     4010.0
Riti     6022.0
Aadi     4022.0
Mohit    6062.0
Veena    4255.0
Shaun    4633.0
Mark     6000.0
dtype: float64

We replaced all NaNs in the column ‘Feb’ by 0 using fillna() function and then added the values in columns ‘Jan’ & ‘Feb’.

Pandas: Sum values in two different columns using loc[] as assign as a new column

We can select the two columns from the dataframe as a mini Dataframe and then we can call the sum() function on this mini Dataframe to get the sum of values in two columns. For example,

# Pandas: Sum values in two different columns using loc[] as assign as a new column
# Get a mini dataframe by selecting column 'Jan' & 'Feb'
mini_df = df.loc[: , ['Jan', 'Feb']]

print('Mini Dataframe:')
print(mini_df)

# Get sum of values of all the columns of Mini Dataframe
total = mini_df.sum(axis=1)

print('Sum of columns Jan and Feb:')
print(total)

Output:

Mini Dataframe:
        Jan     Feb
Name               
jack   2000  2010.0
Riti   3000  3022.0
Aadi   4022     NaN
Mohit  3012  3050.0
Veena  2023  2232.0
Shaun  2123  2510.0
Mark   4000  2000.0

Sum of columns Jan and Feb:
Name
jack     4010.0
Riti     6022.0
Aadi     4022.0
Mohit    6062.0
Veena    4255.0
Shaun    4633.0
Mark     6000.0
dtype: float64

We selected the columns ‘Jan’ & ‘Feb’ using loc[] and got a mini dataframe which contains only these two columns. Then called the sum() with axis=1, which added the values in all the columns and returned a Series object.

We assign this Series object to the original dataframe to have a column containing the sum of two columns ‘Jan’ & ‘Feb’. For examples,

# Get sum of 2 columns by column numbers
df['Jan_Feb'] = df.loc[: , ['Jan', 'Feb']].sum(axis=1)

print('Updated DataFrame:')
print(df)

Output:

        Jan     Feb   March  April   May  Jan_May  Jan_Feb
Name                                                      
jack   2000  2010.0  2050.0   2134  2111     4111   4010.0
Riti   3000  3022.0  3456.0   3111  2109     5109   6022.0
Aadi   4022     NaN  2077.0   2134  3122     7144   4022.0
Mohit  3012  3050.0  2010.0   2122  1111     4123   6062.0
Veena  2023  2232.0     NaN   2112  1099     3122   4255.0
Shaun  2123  2510.0  3050.0   3134  2122     4245   4633.0
Mark   4000  2000.0  2050.0   2122  2111     6111   6000.0

Pandas: Sum values in two columns using column numbers & iloc[]

Instead of column labels, we can also use column numbers to select the two columns from dataframe and then we will add values in those columns,

# Get sum of 2 columns by column numbers
df['Jan_Feb'] = df.iloc[: , [0, 3]].sum(axis=1)

print(total)

Output:

Name
jack     4010.0
Riti     6022.0
Aadi     4022.0
Mohit    6062.0
Veena    4255.0
Shaun    4633.0
Mark     6000.0
dtype: float64

We added the values in the first & third columns of the dataframe and assigned the summed values as a new column in the dataframe.

Conclusion:

This is how we can add the values in two columns to add a new column in the dataframe.

Pandas: Sum two columns together to make a new series

Frequently Asked:

Add two columns to make a new column

Pandas: Sum two columns containing NaN values

Pandas: Sum values in two different columns using loc[] as assign as a new column

Pandas: Sum values in two columns using column numbers & iloc[]

Related posts:

Share your love

Leave a Comment Cancel Reply