Pandas : How to merge Dataframes by index using Dataframe.merge() – Part 3

In this article we will discuss how to merge two dataframes in index of both the dataframes or index of one dataframe and some column of any other dataframe. Also, we will see how to keep the similar index in merged dataframe.

In previous two articles we have discussed about many features of Dataframe.merge(). Therefore here just a small intro of API i.e.

LINK

DataFrame.merge()

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

It’s used to merge dataframes.

Here we will focus on a few arguments only i.e.

  • on : Column name on which merge will be done. If not provided then merged on indexes.
  • left_on : Specific column names in left dataframe, on which merge will be done.
  • right_on : Specific column names in right dataframe, on which merge will be done.
  • left_index : bool (default False)
    • If True will choose index from left dataframe as join key.
  • right_index : bool (default False)
    • If True will choose index from right dataframe as join key.

Let’s see some examples to see how to merge dataframes on index.

First of all, let’s create two dataframes to be merged.

Dataframe 1:
This dataframe contains the details of the employees like, name, city, experience & Age. Index of the dataframe contains the IDs i.e.

# List of Tuples
empoyees = [ (11, 'jack', 34, 'Sydney', 5) ,
           (12, 'Riti', 31, 'Delhi' , 7) ,
           (13, 'Aadi', 16, 'New York', 11) ,
           (14, 'Mohit', 32,'Delhi' , 15) ,
           (15, 'Veena', 33, 'Delhi' , 4) ,
           (16, 'Shaunak', 35, 'Mumbai', 5 ),
            (17, 'Shaun', 35, 'Colombo', 11)
            ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Experience'])
empDfObj = empDfObj.set_index('ID')

Contents of the first dataframe created are,

       Name  Age      City  Experience
ID                                    
11     jack   34    Sydney           5
12     Riti   31     Delhi           7
13     Aadi   16  New York          11
14    Mohit   32     Delhi          15
15    Veena   33     Delhi           4
16  Shaunak   35    Mumbai           5
17    Shaun   35   Colombo          11

Dataframe 2:
This dataframe contains the details of the employees like, salary, bonus and experience. . Index of the dataframe contains the IDs i.e.

# List of Tuples
salaries = [(11, 'Junior', 70000, 1000) ,
           (12, 'Senior', 72200, 1100) ,
           (13, 'Expert', 84999, 1000) ,
           (14, 'Expert', 90000, 2000) ,
           (15, 'Junior', 61000, 1500) ,
           (16, 'Junior', 71000, 1000),
           (21, 'Senior',81000, 2000)
            ]

# Create a DataFrame object
salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience', 'Salary', 'Bonus'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
salaryDfObj = salaryDfObj.set_index('ID')

Contents of the second dataframe created are,

   Experience  Salary  Bonus
ID                          
11     Junior   70000   1000
12     Senior   72200   1100
13     Expert   84999   1000
14     Expert   90000   2000
15     Junior   61000   1500
16     Junior   71000   1000
21     Senior   81000   2000

Merging Dataframes by index of both the dataframes

As both the dataframe contains similar IDs on the index. So, to merge the dataframe on indices pass the left_index & right_index arguments as True i.e.

# Merge two Dataframes on index of both the dataframes
mergedDf = empDfObj.merge(salaryDfObj, left_index=True, right_index=True)

Contents of the merged dataframe are,

       Name  Age      City  Experience_x Experience_y  Salary  Bonus
ID                                                                  
11     jack   34    Sydney             5       Junior   70000   1000
12     Riti   31     Delhi             7       Senior   72200   1100
13     Aadi   16  New York            11       Expert   84999   1000
14    Mohit   32     Delhi            15       Expert   90000   2000
15    Veena   33     Delhi             4       Junior   61000   1500
16  Shaunak   35    Mumbai             5       Junior   71000   1000

Both the dataframes are merged on index using default Inner Join. By this way we basically merged the dataframes by index and also kept the index as it is in merged dataframe.

Merge two Dataframes on index of one dataframe and some column of other dataframe

What if we want to merge two dataframe by index of first dataframe and on some column of second dataframe ?

For example let’s change the dataframe salaryDfObj by adding a new column ‘EmpID‘ and also reset it’s index i.e.

# Modify Dataframe 2 by reseting the Index and adding a new column EmpID
salaryDfObj['EmpID'] = salaryDfObj.index
salaryDfObj.reset_index(inplace=True)
del salaryDfObj['ID']

Contents of the dataframe 2 salaryDfObj are,

  Experience  Salary  Bonus  EmpID
0     Junior   70000   1000     11
1     Senior   72200   1100     12
2     Expert   84999   1000     13
3     Expert   90000   2000     14
4     Junior   61000   1500     15
5     Junior   71000   1000     16
6     Senior   81000   2000     21

Now to merge dataframe 1 i.e. empDFObj by index with dataframe 2 i.e. salaryDfObj on some column pass following arguments,

  • left_index=True
  • right_on=’EmpID’

For example,

# Merge two Dataframes on index of both the dataframes
mergedDf = empDfObj.merge(salaryDfObj, left_index=True, right_on='EmpID')
mergedDf = mergedDf.set_index('EmpID')

Contents of the merged dataframe are,

          Name  Age      City  Experience_x Experience_y  Salary  Bonus
EmpID                                                                  
11        jack   34    Sydney             5       Junior   70000   1000
12        Riti   31     Delhi             7       Senior   72200   1100
13        Aadi   16  New York            11       Expert   84999   1000
14       Mohit   32     Delhi            15       Expert   90000   2000
15       Veena   33     Delhi             4       Junior   61000   1500
16     Shaunak   35    Mumbai             5       Junior   71000   1000

Here, in merged dataframe we also set the column ‘EmpID‘ as the index of dataframe.

In another scenario we can also do the vice versa i.e. merge two dataframe on some column of first dataframe and by index of second dataframe by passing following arguments right_index=True and left_on=<Column_Name>.

By this we also kept the index as it is in merged dataframe.

Complete example is as follows,

import pandas as pd

def main():
    pd.set_option('display.max_rows', 100)
    pd.set_option('display.max_columns', 100)
    pd.set_option('display.width', 1500)

    print('*** Creating Dataframe 1 ***')
    # List of Tuples
    empoyees = [ (11, 'jack', 34, 'Sydney', 5) ,
               (12, 'Riti', 31, 'Delhi' , 7) ,
               (13, 'Aadi', 16, 'New York', 11) ,
               (14, 'Mohit', 32,'Delhi' , 15) ,
               (15, 'Veena', 33, 'Delhi' , 4) ,
               (16, 'Shaunak', 35, 'Mumbai', 5 ),
                (17, 'Shaun', 35, 'Colombo', 11)
                ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Experience'])
    empDfObj = empDfObj.set_index('ID')
    print("Dataframe 1 : ")
    print(empDfObj)

    print('*** Creating Dataframe 2 ***')

    # List of Tuples
    salaries = [(11, 'Junior', 70000, 1000) ,
               (12, 'Senior', 72200, 1100) ,
               (13, 'Expert', 84999, 1000) ,
               (14, 'Expert', 90000, 2000) ,
               (15, 'Junior', 61000, 1500) ,
               (16, 'Junior', 71000, 1000),
               (21, 'Senior',81000, 2000)
                ]

    # Create a DataFrame object
    salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience', 'Salary', 'Bonus'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
    salaryDfObj = salaryDfObj.set_index('ID')
    print("Dataframe 2 : ")
    print(salaryDfObj)


    print('**** Merge two Dataframes on index of both dataframes ****')

    # Merge two Dataframes on index of both the dataframes
    mergedDf = empDfObj.merge(salaryDfObj, left_index=True, right_index=True)
    print('Contents of the Merged Dataframe :')
    print(mergedDf)


    print('Change the index of dataframe 2')
    # Modify Dataframe 2 by reseting the Index and adding a new column EmpID
    salaryDfObj['EmpID'] = salaryDfObj.index
    salaryDfObj.reset_index(inplace=True)
    del salaryDfObj['ID']
    print("Dataframe 2 : ")
    print(salaryDfObj)

    print('**** Merge two Dataframes on index of one dataframe and some column of other dataframe ****')

    # Merge two Dataframes on index of both the dataframes
    mergedDf = empDfObj.merge(salaryDfObj, left_index=True, right_on='EmpID')
    mergedDf = mergedDf.set_index('EmpID')
    print('Contents of the Merged Dataframe :')
    print(mergedDf)

if __name__ == '__main__':
  main()

Output:

*** Creating Dataframe 1 ***
Dataframe 1 : 
       Name  Age      City  Experience
ID                                    
11     jack   34    Sydney           5
12     Riti   31     Delhi           7
13     Aadi   16  New York          11
14    Mohit   32     Delhi          15
15    Veena   33     Delhi           4
16  Shaunak   35    Mumbai           5
17    Shaun   35   Colombo          11
*** Creating Dataframe 2 ***
Dataframe 2 : 
   Experience  Salary  Bonus
ID                          
11     Junior   70000   1000
12     Senior   72200   1100
13     Expert   84999   1000
14     Expert   90000   2000
15     Junior   61000   1500
16     Junior   71000   1000
21     Senior   81000   2000
**** Merge two Dataframes on index of both dataframes ****
Contents of the Merged Dataframe :
       Name  Age      City  Experience_x Experience_y  Salary  Bonus
ID                                                                  
11     jack   34    Sydney             5       Junior   70000   1000
12     Riti   31     Delhi             7       Senior   72200   1100
13     Aadi   16  New York            11       Expert   84999   1000
14    Mohit   32     Delhi            15       Expert   90000   2000
15    Veena   33     Delhi             4       Junior   61000   1500
16  Shaunak   35    Mumbai             5       Junior   71000   1000
Change the index of dataframe 2
Dataframe 2 : 
  Experience  Salary  Bonus  EmpID
0     Junior   70000   1000     11
1     Senior   72200   1100     12
2     Expert   84999   1000     13
3     Expert   90000   2000     14
4     Junior   61000   1500     15
5     Junior   71000   1000     16
6     Senior   81000   2000     21
**** Merge two Dataframes on index of one dataframe and some column of other dataframe ****
Contents of the Merged Dataframe :
          Name  Age      City  Experience_x Experience_y  Salary  Bonus
EmpID                                                                  
11        jack   34    Sydney             5       Junior   70000   1000
12        Riti   31     Delhi             7       Senior   72200   1100
13        Aadi   16  New York            11       Expert   84999   1000
14       Mohit   32     Delhi            15       Expert   90000   2000
15       Veena   33     Delhi             4       Junior   61000   1500
16     Shaunak   35    Mumbai             5       Junior   71000   1000

 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top