In this article we will discuss how to merge different Dataframes into a single Dataframe using Pandas Dataframe.merge() function. Merging is a big topic, so in this part we will focus on merging dataframes using common columns as Join Key and joining using Inner Join, Right Join, Left Join and Outer Join.

Dataframe.merge()

In Python’s Pandas Library Dataframe class provides a function to merge Dataframes i.e.

It accepts a hell lot of arguments. Let’s discuss some of them,
Imp Arguments :

  • right : A dataframe or series to be merged with calling dataframe
  • how : Merge type, values are : left, right, outer, inner. Default is ‘inner’. If both dataframes has some different columns, then based on this value, it will be decided which columns will be in the merged dataframe.
  • on : Column name on which merge will be done. If not provided then merged on indexes.
  • left_on : Specific column names in left dataframe, on which merge will be done.
  • right_on : Specific column names in right dataframe, on which merge will be done.
  • left_index : bool (default False)
    • If True will choose index from left dataframe as join key.
  • right_index : bool (default False)
    • If True will choose index from right dataframe as join key.
  • suffixes : tuple of (str, str), default (‘_x’, ‘_y’)
    • Suffex to be applied on overlapping columns in left & right dataframes respectively.

Well these are a lot of arguments and things seems over engineered here. So, let’s discuss each details be small examples one by one.

First of all, let’s create two dataframes to be merged.

Dataframe 1:
This dataframe contains the details of the employees like, ID, name, city, experience & Age i.e.

Contents of the first dataframe empDfObj created are,

Dataframe 2:
This dataframe contains the details of the employees like, ID, salary, bonus and experience i.e.

Contents of the second dataframe created are,

Now let’s see different ways to merge these two dataframes,

Merge DataFrames on common columns (Default Inner Join)

In both the Dataframes we have 2 common column names i.e. ‘ID’ & ‘Experience’. If we directly call Dataframe.merge() on these two Dataframes, without any additional arguments, then it will merge the columns of the both the dataframes  by considering common columns as Join Keys i.e. ‘ID’ & ‘Experience’ in our case. So, basically columns from both the dataframes will be merged for the rows in which values of ‘ID’ & ‘Experience’ are same i.e.

Merged Dataframe mergedDf  contents are:

It merged the contents of the unique columns (salary & bonus) from dataframe 2 with the columns of dataframe 1 based on ‘ID’ & ‘Experience’ columns. Because if we don’t provide the column names on which we want to merge the two dataframes then it by defaults merge on columns with common names. Like, in our case it was ‘ID’ & ‘Experience’.

Also, we didn’t provided the ‘how’ argument in merge() call. Default value of ‘how’ is ‘inner’. It means dataframes are merged like INNER JOIN in Databases.

What is Inner Join ?

While Merging or Joining on columns (keys) in two Dataframes. Include only rows from Left & Right dataframes which have same values in key columns.

In above example key columns on which inner join happened were ‘ID’ & ‘Experience’ columns. So, during inner join only those rows are picked in merged dataframe for which values of ‘ID’ & ‘Experience’ columns are same in 2 dataframes. So basically by default Inner Join was done by using intersection of keys in both the dataframes.

Results will be same if we explicitly pass ‘how’ argument with value ‘inner’ i.e.

Merge Dataframes using Left Join

What is left join ?

While Merging or Joining on columns (keys) in two Dataframes. Include all rows from Left dataframe and add NaN for values which are
missing in right dataframe for those keys.

In above example if we will pass how argument with value ‘left’ then it will merge two dataframes using left join i.e.

Contents of the merged dataframe :

We can see that it picked all rows from left dataframe and there is no row with ‘ID’ 17 and ‘Experience’ 11 in right dataframe. Therefore for that row values of unique Columns from right dataframe (Salary and Bonus) are NaN in merged dataframe.

Merge DataFrames using Right Join

What is Right join ?

While Merging or Joining on columns (keys) in two Dataframes. Include all rows from Right dataframe and add NaN for values which are
missing in Left dataframe for those keys.

In above example if we will pass how argument with value ‘right’ then it will merge two dataframes using Right Join i.e.

Contents of the merged dataframe :

We can see that it picked all rows from right dataframe and there is no row with ID 21 and Experience 10 in left dataframe. Therefore for that row values of unique Columns from left dataframe (i.e. Name, Age, City) are NaN in merged dataframe.

Merge DataFrames using Outer Join

What is Outer join ?

While Merging or Joining on columns (keys) in two Dataframes. Include all rows from Right and Left dataframes and add NaN for values which are missing in either Left or Right dataframe for any key.

In above example if we will pass how argument with value ‘outer’ then it will merge two dataframes using Outer Join i.e.

Contents of the merged dataframe :

We can see that it picked all rows from right & left dataframes and there is no row with,

  • ID 21 and Experience 10 in left dataframe
  • ID 17 and Experience 11 in right dataframe

Therefore for that row NaN is added for missing values in merged dataframe.

Complete example is as follows,

Output:

 

 

Join LinkedIn Group of Python Professional Developers who wish to expand their network and share ideas.

You can also follow us On Twitter :

Click Here to Subscribe for more Articles / Tutorials like this.