In this article we will discuss how to merge dataframes on given columns or index as Join keys.

First let’s get a little intro about Dataframe.merge() again,

Dataframe.merge()

In Python’s Pandas Library Dataframe class provides a function to merge Dataframes i.e.

It accepts a hell lot of arguments. Let’s discuss some of them,
Imp Arguments :

  • right : A dataframe or series to be merged with calling dataframe
  • how : Merge type, values are : left, right, outer, inner. Default is ‘inner’. If both dataframes has some different columns, then based on this value, it will be decided which columns will be in the merged dataframe.
  • on : Column name on which merge will be done. If not provided then merged on indexes.
  • left_on : Specific column names in left dataframe, on which merge will be done.
  • right_on : Specific column names in right dataframe, on which merge will be done.
  • left_index : bool (default False)
    • If True will choose index from left dataframe as join key.
  • right_index : bool (default False)
    • If True will choose index from right dataframe as join key.
  • suffixes : tuple of (str, str), default (‘_x’, ‘_y’)
    • Suffex to be applied on overlapping columns in left & right dataframes respectively.

In our previous article our focus was on merging using ‘how’ argument i.e. basically merging Dataframes by default on common columns using different join types. But in this article we will mainly focus on other arguments like what if don’t want to join an all common columns ? What if we want to join on some selected columns only? Let’s see some examples to understand this,

First of all, let’s create two dataframes to be merged.

Dataframe 1:
This dataframe contains the details of the employees like, ID, name, city, experience & Age i.e.

Contents of the first dataframe created are,

Dataframe 2:
This dataframe contains the details of the employees like, ID, salary, bonus and experience i.e.

Contents of the second dataframe created are,

Merging Dataframe on a given column name as join key

In both the above dataframes two column names are common i.e. ID & Experience. But contents of Experience column in both the dataframes are of different types, one is int and other is string. There is no point in merging based on that column. By default if we don’t pass the on argument then Dataframe.merge() will merge it on both the columns ID & Experience as we saw in previous post i.e.
https://thispointer.com/pandas-how-to-merge-dataframes-using-dataframe-merge-in-python-part-1/

Which will not work here. Therefore, here we need to merge these two dataframes on a single column i.e. ID. To do that pass the ‘on’ argument in the Datfarame.merge() with column name on which we want to join / merge these 2 dataframes i.e.

Contents of the merged dataframe,

It merged both the above two dataframes on ‘ID’ column. As both the dataframes had a columns with name ‘Experience’, so both the columns were added with default suffix to differentiate between them i.e. Experience_x for column from Left Dataframe and Experience_y for column from Right Dataframe.

In you want to join on multiple columns instead of  a single column, then you can pass a list of column names to Dataframe.merge() instead of single column name. Also, as we didn’t specified the value of ‘how’ argument, therefore by default Dataframe.merge() uses inner join. You can also specify the join type using ‘how’ argument as explained in previous article i.e.

Pandas : How to Merge Dataframes using Dataframe.merge() in Python – Part 1

 

Merging Dataframe on a given column with suffix for similar column names

If there are some similar column names in both the dataframes which are not in join key then by default x & y is added as suffix to them.
Like in previous example merged dataframe contains Experience_x & Experience_y. Instead of default suffix, we can pass our custom suffix too i.e.

Contents of the merged dataframe,

We passed a tuple (‘_In_Years’, ‘_Levels’) in suffixes argument and ‘_In_Years’ is added to column from left dataframe and ‘_Levels’ is added to columns from right dataframe.

Merging Dataframe different columns

What if both the dataframes was completely different column names. For example let’s rename column ‘ID’ in dataframe 2 i.e.

Now out second dataframe salaryDFObj contents are,

Now let’s see how to merge these two dataframes on ‘ID‘ column from Dataframe 1 and ‘EmpID‘ column from dataframe 2 i.e.

Contents of the merged dataframe,

Till now we have seen merging on columns either by default on specifically given columns. But what if we want to merge in index of both the dataframe ?

We will discuss that in next article.

Complete example is as follows,

Output:

 

If you didn't find what you were looking, then do suggest us in the comments below. We will be more than happy to add that.

Do Subscribe with us for more Articles / Tutorials like this,