In this article we will discuss how to merge two dataframes in index of both the dataframes or index of one dataframe and some column of any other dataframe. Also, we will see how to keep the similar index in merged dataframe.
In previous two articles we have discussed about many features of Dataframe.merge(). Therefore here just a small intro of API i.e.
LINK
DataFrame.merge()
Frequently Asked:
DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
It’s used to merge dataframes.
Here we will focus on a few arguments only i.e.
- on : Column name on which merge will be done. If not provided then merged on indexes.
- left_on : Specific column names in left dataframe, on which merge will be done.
- right_on : Specific column names in right dataframe, on which merge will be done.
- left_index : bool (default False)
- If True will choose index from left dataframe as join key.
- right_index : bool (default False)
- If True will choose index from right dataframe as join key.
Let’s see some examples to see how to merge dataframes on index.
First of all, let’s create two dataframes to be merged.
Latest Python - Video Tutorial
Dataframe 1:
This dataframe contains the details of the employees like, name, city, experience & Age. Index of the dataframe contains the IDs i.e.
# List of Tuples empoyees = [ (11, 'jack', 34, 'Sydney', 5) , (12, 'Riti', 31, 'Delhi' , 7) , (13, 'Aadi', 16, 'New York', 11) , (14, 'Mohit', 32,'Delhi' , 15) , (15, 'Veena', 33, 'Delhi' , 4) , (16, 'Shaunak', 35, 'Mumbai', 5 ), (17, 'Shaun', 35, 'Colombo', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Experience']) empDfObj = empDfObj.set_index('ID')
Contents of the first dataframe created are,
Name Age City Experience ID 11 jack 34 Sydney 5 12 Riti 31 Delhi 7 13 Aadi 16 New York 11 14 Mohit 32 Delhi 15 15 Veena 33 Delhi 4 16 Shaunak 35 Mumbai 5 17 Shaun 35 Colombo 11
Dataframe 2:
This dataframe contains the details of the employees like, salary, bonus and experience. . Index of the dataframe contains the IDs i.e.
# List of Tuples salaries = [(11, 'Junior', 70000, 1000) , (12, 'Senior', 72200, 1100) , (13, 'Expert', 84999, 1000) , (14, 'Expert', 90000, 2000) , (15, 'Junior', 61000, 1500) , (16, 'Junior', 71000, 1000), (21, 'Senior',81000, 2000) ] # Create a DataFrame object salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience', 'Salary', 'Bonus'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) salaryDfObj = salaryDfObj.set_index('ID')
Contents of the second dataframe created are,
Experience Salary Bonus ID 11 Junior 70000 1000 12 Senior 72200 1100 13 Expert 84999 1000 14 Expert 90000 2000 15 Junior 61000 1500 16 Junior 71000 1000 21 Senior 81000 2000
Merging Dataframes by index of both the dataframes
As both the dataframe contains similar IDs on the index. So, to merge the dataframe on indices pass the left_index & right_index arguments as True i.e.
# Merge two Dataframes on index of both the dataframes mergedDf = empDfObj.merge(salaryDfObj, left_index=True, right_index=True)
Contents of the merged dataframe are,
Name Age City Experience_x Experience_y Salary Bonus ID 11 jack 34 Sydney 5 Junior 70000 1000 12 Riti 31 Delhi 7 Senior 72200 1100 13 Aadi 16 New York 11 Expert 84999 1000 14 Mohit 32 Delhi 15 Expert 90000 2000 15 Veena 33 Delhi 4 Junior 61000 1500 16 Shaunak 35 Mumbai 5 Junior 71000 1000
Both the dataframes are merged on index using default Inner Join. By this way we basically merged the dataframes by index and also kept the index as it is in merged dataframe.
Merge two Dataframes on index of one dataframe and some column of other dataframe
What if we want to merge two dataframe by index of first dataframe and on some column of second dataframe ?
For example let’s change the dataframe salaryDfObj by adding a new column ‘EmpID‘ and also reset it’s index i.e.
# Modify Dataframe 2 by reseting the Index and adding a new column EmpID salaryDfObj['EmpID'] = salaryDfObj.index salaryDfObj.reset_index(inplace=True) del salaryDfObj['ID']
Contents of the dataframe 2 salaryDfObj are,
Experience Salary Bonus EmpID 0 Junior 70000 1000 11 1 Senior 72200 1100 12 2 Expert 84999 1000 13 3 Expert 90000 2000 14 4 Junior 61000 1500 15 5 Junior 71000 1000 16 6 Senior 81000 2000 21
Now to merge dataframe 1 i.e. empDFObj by index with dataframe 2 i.e. salaryDfObj on some column pass following arguments,
- left_index=True
- right_on=’EmpID’
For example,
# Merge two Dataframes on index of both the dataframes mergedDf = empDfObj.merge(salaryDfObj, left_index=True, right_on='EmpID') mergedDf = mergedDf.set_index('EmpID')
Contents of the merged dataframe are,
Name Age City Experience_x Experience_y Salary Bonus EmpID 11 jack 34 Sydney 5 Junior 70000 1000 12 Riti 31 Delhi 7 Senior 72200 1100 13 Aadi 16 New York 11 Expert 84999 1000 14 Mohit 32 Delhi 15 Expert 90000 2000 15 Veena 33 Delhi 4 Junior 61000 1500 16 Shaunak 35 Mumbai 5 Junior 71000 1000
Here, in merged dataframe we also set the column ‘EmpID‘ as the index of dataframe.
In another scenario we can also do the vice versa i.e. merge two dataframe on some column of first dataframe and by index of second dataframe by passing following arguments right_index=True and left_on=<Column_Name>.
By this we also kept the index as it is in merged dataframe.
Complete example is as follows,
import pandas as pd def main(): pd.set_option('display.max_rows', 100) pd.set_option('display.max_columns', 100) pd.set_option('display.width', 1500) print('*** Creating Dataframe 1 ***') # List of Tuples empoyees = [ (11, 'jack', 34, 'Sydney', 5) , (12, 'Riti', 31, 'Delhi' , 7) , (13, 'Aadi', 16, 'New York', 11) , (14, 'Mohit', 32,'Delhi' , 15) , (15, 'Veena', 33, 'Delhi' , 4) , (16, 'Shaunak', 35, 'Mumbai', 5 ), (17, 'Shaun', 35, 'Colombo', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Experience']) empDfObj = empDfObj.set_index('ID') print("Dataframe 1 : ") print(empDfObj) print('*** Creating Dataframe 2 ***') # List of Tuples salaries = [(11, 'Junior', 70000, 1000) , (12, 'Senior', 72200, 1100) , (13, 'Expert', 84999, 1000) , (14, 'Expert', 90000, 2000) , (15, 'Junior', 61000, 1500) , (16, 'Junior', 71000, 1000), (21, 'Senior',81000, 2000) ] # Create a DataFrame object salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience', 'Salary', 'Bonus'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) salaryDfObj = salaryDfObj.set_index('ID') print("Dataframe 2 : ") print(salaryDfObj) print('**** Merge two Dataframes on index of both dataframes ****') # Merge two Dataframes on index of both the dataframes mergedDf = empDfObj.merge(salaryDfObj, left_index=True, right_index=True) print('Contents of the Merged Dataframe :') print(mergedDf) print('Change the index of dataframe 2') # Modify Dataframe 2 by reseting the Index and adding a new column EmpID salaryDfObj['EmpID'] = salaryDfObj.index salaryDfObj.reset_index(inplace=True) del salaryDfObj['ID'] print("Dataframe 2 : ") print(salaryDfObj) print('**** Merge two Dataframes on index of one dataframe and some column of other dataframe ****') # Merge two Dataframes on index of both the dataframes mergedDf = empDfObj.merge(salaryDfObj, left_index=True, right_on='EmpID') mergedDf = mergedDf.set_index('EmpID') print('Contents of the Merged Dataframe :') print(mergedDf) if __name__ == '__main__': main()
Output:
*** Creating Dataframe 1 *** Dataframe 1 : Name Age City Experience ID 11 jack 34 Sydney 5 12 Riti 31 Delhi 7 13 Aadi 16 New York 11 14 Mohit 32 Delhi 15 15 Veena 33 Delhi 4 16 Shaunak 35 Mumbai 5 17 Shaun 35 Colombo 11 *** Creating Dataframe 2 *** Dataframe 2 : Experience Salary Bonus ID 11 Junior 70000 1000 12 Senior 72200 1100 13 Expert 84999 1000 14 Expert 90000 2000 15 Junior 61000 1500 16 Junior 71000 1000 21 Senior 81000 2000 **** Merge two Dataframes on index of both dataframes **** Contents of the Merged Dataframe : Name Age City Experience_x Experience_y Salary Bonus ID 11 jack 34 Sydney 5 Junior 70000 1000 12 Riti 31 Delhi 7 Senior 72200 1100 13 Aadi 16 New York 11 Expert 84999 1000 14 Mohit 32 Delhi 15 Expert 90000 2000 15 Veena 33 Delhi 4 Junior 61000 1500 16 Shaunak 35 Mumbai 5 Junior 71000 1000 Change the index of dataframe 2 Dataframe 2 : Experience Salary Bonus EmpID 0 Junior 70000 1000 11 1 Senior 72200 1100 12 2 Expert 84999 1000 13 3 Expert 90000 2000 14 4 Junior 61000 1500 15 5 Junior 71000 1000 16 6 Senior 81000 2000 21 **** Merge two Dataframes on index of one dataframe and some column of other dataframe **** Contents of the Merged Dataframe : Name Age City Experience_x Experience_y Salary Bonus EmpID 11 jack 34 Sydney 5 Junior 70000 1000 12 Riti 31 Delhi 7 Senior 72200 1100 13 Aadi 16 New York 11 Expert 84999 1000 14 Mohit 32 Delhi 15 Expert 90000 2000 15 Veena 33 Delhi 4 Junior 61000 1500 16 Shaunak 35 Mumbai 5 Junior 71000 1000
Latest Video Tutorials