In this article we will discuss different techniques to sum values of two columns in a dataframe and assign summed values as a new column. We will also cover the scenarios where columns contain NaN values.
Suppose we have a Dataframe which contains employee name as an index and each column contains their monthly salaries,
# List of Tuples empSalary = [('jack', 2000, 2010, 2050, 2134, 2111), ('Riti', 3000, 3022, 3456, 3111, 2109), ('Aadi', 4022, np.NaN, 2077, 2134, 3122), ('Mohit', 3012, 3050, 2010, 2122, 1111), ('Veena', 2023, 2232, np.NaN, 2112, 1099), ('Shaun', 2123, 2510, 3050, 3134, 2122), ('Mark', 4000, 2000, 2050, 2122, 2111) ] # Create a DataFrame object df = pd.DataFrame(empSalary, columns=['Name', 'Jan', 'Feb', 'March', 'April', 'May']) df.set_index('Name', inplace=True) print(df)
Output:
Jan Feb March April May Name jack 2000 2010.0 2050.0 2134 2111 Riti 3000 3022.0 3456.0 3111 2109 Aadi 4022 NaN 2077.0 2134 3122 Mohit 3012 3050.0 2010.0 2122 1111 Veena 2023 2232.0 NaN 2112 1099 Shaun 2123 2510.0 3050.0 3134 2122 Mark 4000 2000.0 2050.0 2122 2111
Now we want to add the values of two columns altogether and create a new column out of those summed values. Let’s see how to do that,
Pandas: Sum two columns together to make a new series
We can select individual columns by column names using [] operator and then we can add values in those columns using + operator. For example,
Frequently Asked:
# Add two columns together to make a new series total = df['Jan'] + df['May'] print(total)
Output:
Name jack 4111 Riti 5109 Aadi 7144 Mohit 4123 Veena 3122 Shaun 4245 Mark 6111 dtype: int64
It returned a Series and each value in the series contains the sum of values from columns ‘Jan’ & ‘May’ for respective indexes.
Add two columns to make a new column
In the previous example we got a Series by adding values of 2 columns. We can also set this new series object as a new column in the dataframe i.e.
# Add two columns to make a new column df['Jan_May'] = df['Jan'] + df['May'] print('Updated DataFrame:') print(df)
Output:
Latest Python - Video Tutorial
Updated DataFrame: Jan Feb March April May Jan_May Name jack 2000 2010.0 2050.0 2134 2111 4111 Riti 3000 3022.0 3456.0 3111 2109 5109 Aadi 4022 NaN 2077.0 2134 3122 7144 Mohit 3012 3050.0 2010.0 2122 1111 4123 Veena 2023 2232.0 NaN 2112 1099 3122 Shaun 2123 2510.0 3050.0 3134 2122 4245 Mark 4000 2000.0 2050.0 2122 2111 6111
Column ‘Jan_May’ contains the sum of values in column ‘Jan’ & column ‘May’.
Pandas: Sum two columns containing NaN values
Suppose we are adding the values of two columns and some entries in any of the columns are NaN, then in the final Series object values of those indexes will be NaN. For example, in our dataframe column ‘Feb’ has some NaN values.
Let’s try to add the column ‘Jan’ & ‘Feb’,
# pandas sum two columns with NaN total = df['Jan'] + df['Feb'] print(total)
Output:
Name jack 4010.0 Riti 6022.0 Aadi NaN Mohit 6062.0 Veena 4255.0 Shaun 4633.0 Mark 6000.0 dtype: float64
We got a Series object by adding values in column ‘Jan’ & ‘Feb’. But for indexes where column ‘Feb’ has NaNs, values, the Series object ‘total’ also have NaN values for those indexes.
This might not be the required behavior for some. If we want then we can replace NaN values in a column by 0 before adding them to other column’s values. This way we will not have any NaN in the returned Series object. For example,
# Replace NaN by 0 and then add values in two columns total = df['Jan'] + df['Feb'].fillna(0) print(total)
Output:
Name jack 4010.0 Riti 6022.0 Aadi 4022.0 Mohit 6062.0 Veena 4255.0 Shaun 4633.0 Mark 6000.0 dtype: float64
We replaced all NaNs in the column ‘Feb’ by 0 using fillna() function and then added the values in columns ‘Jan’ & ‘Feb’.
Pandas: Sum values in two different columns using loc[] as assign as a new column
We can select the two columns from the dataframe as a mini Dataframe and then we can call the sum() function on this mini Dataframe to get the sum of values in two columns. For example,
# Pandas: Sum values in two different columns using loc[] as assign as a new column # Get a mini dataframe by selecting column 'Jan' & 'Feb' mini_df = df.loc[: , ['Jan', 'Feb']] print('Mini Dataframe:') print(mini_df) # Get sum of values of all the columns of Mini Dataframe total = mini_df.sum(axis=1) print('Sum of columns Jan and Feb:') print(total)
Output:
Mini Dataframe: Jan Feb Name jack 2000 2010.0 Riti 3000 3022.0 Aadi 4022 NaN Mohit 3012 3050.0 Veena 2023 2232.0 Shaun 2123 2510.0 Mark 4000 2000.0 Sum of columns Jan and Feb: Name jack 4010.0 Riti 6022.0 Aadi 4022.0 Mohit 6062.0 Veena 4255.0 Shaun 4633.0 Mark 6000.0 dtype: float64
We selected the columns ‘Jan’ & ‘Feb’ using loc[] and got a mini dataframe which contains only these two columns. Then called the sum() with axis=1, which added the values in all the columns and returned a Series object.
We assign this Series object to the original dataframe to have a column containing the sum of two columns ‘Jan’ & ‘Feb’. For examples,
# Get sum of 2 columns by column numbers df['Jan_Feb'] = df.loc[: , ['Jan', 'Feb']].sum(axis=1) print('Updated DataFrame:') print(df)
Output:
Jan Feb March April May Jan_May Jan_Feb Name jack 2000 2010.0 2050.0 2134 2111 4111 4010.0 Riti 3000 3022.0 3456.0 3111 2109 5109 6022.0 Aadi 4022 NaN 2077.0 2134 3122 7144 4022.0 Mohit 3012 3050.0 2010.0 2122 1111 4123 6062.0 Veena 2023 2232.0 NaN 2112 1099 3122 4255.0 Shaun 2123 2510.0 3050.0 3134 2122 4245 4633.0 Mark 4000 2000.0 2050.0 2122 2111 6111 6000.0
Pandas: Sum values in two columns using column numbers & iloc[]
Instead of column labels, we can also use column numbers to select the two columns from dataframe and then we will add values in those columns,
# Get sum of 2 columns by column numbers df['Jan_Feb'] = df.iloc[: , [0, 3]].sum(axis=1) print(total)
Output:
Name jack 4010.0 Riti 6022.0 Aadi 4022.0 Mohit 6062.0 Veena 4255.0 Shaun 4633.0 Mark 6000.0 dtype: float64
We added the values in the first & third columns of the dataframe and assigned the summed values as a new column in the dataframe.
Conclusion:
This is how we can add the values in two columns to add a new column in the dataframe.
Latest Video Tutorials