In this article we will discuss how to find duplicate columns in a Pandas DataFrame and drop them.

In Python’s pandas library there are direct APIs to find out the duplicate rows, but there is no direct API to find the duplicate columns.
So, we have to build our API for that.

First of all, create a DataFrame with duplicate columns i.e.

Contents of the DataFrame created are,

 

Now as we can observer there are 3 duplicate columns in this DataFrame i.e. Marks, Address & Pin. Let’s see how to find them.

Find duplicate columns in a DataFrame

To find these duplicate columns we need to iterate over DataFrame column wise and for every column it will search if any other column exists in DataFrame with same contents. If yes then then that column name will be stored in duplicate column list. In the end API will return the list of column names of duplicate columns i.e.

Now let’s use this API to find the duplicate columns in above created DataFrame object dfObj i.e.

Output:

Drop duplicate columns in a DataFrame

To remove the duplicate columns we can pass the list of duplicate column’s names returned by our API to the dataframe.drop() i.e.

Output:

It will return a copy of existing DataFrame without duplicate columns.

Complete example is as follows,

Output:

 

If you didn't find what you were looking, then do suggest us in the comments below. We will be more than happy to add that.

Do Subscribe with us for more Articles / Tutorials like this,