Extract specific columns to new DataFrame as copy in Pandas

Multiple times while processing data, it is important to extract specific columns and store them in a new DataFrame. In this article, we will cover multiple ways to achieve that.

Table of Contents

Preparing Dataset

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees= [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

Contents of the created dataframe are,

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

Now let’s see how to select specific columns from this, to a new DataFrame as copy.

Advertisements

Select subset of columns using copy() function

The simplest way to extract columns is to select the columns from the original DataFrame using [] operator and then copy it using the pandas.DataFrame.copy() function. Let’s take a look at code using an example, say, we need to select the columns “Name” and “Team” from the above DataFrame and save it as a new DataFrame.

# saving columns Name and Team into df_new
df_new = df[['Name', 'Team']].copy()

print (df_new)

Output

      Name    Team
0  Shubham    Tech
1     Riti    Tech
2   Shanky     PMO
3   Shreya  Design
4     Aadi    Tech
5      Sim    Tech

As observed, the new DataFrame (df_new) contains the two selected columns (Name and Team) from the original DataFrame.

Select subset of columns using filter() function

The filter() function also works similarly to the above method. Here, we don’t need to explicitly copy the DataFrame, as the filter() function automatically creates one with the selected columns. Let’s execute the above task using the filter function.

# Select Name and Team columns from DataFrame as copy
# using filter() function
df_new = df.filter(['Name', 'Team'])

print (df_new)

Output

      Name    Team
0  Shubham    Tech
1     Riti    Tech
2   Shanky     PMO
3   Shreya  Design
4     Aadi    Tech
5      Sim    Tech

Here you go, we have similar output, i.e., the columns Name and Team saved in a new DataFrame (df_new).

Select subset of columns using drop() function

The drop() function is used to drop the non-required columns from the DataFrame. It comes in handy, in cases where we need to select the majority of the columns from a large dataset. So instead of listing down the column names to be selected, we can mention the column names that need to be dropped.

In the above example, we need the columns Name and Team, meaning that we will have to drop the other columns (Designation and Experience). Here, since the columns are very few, it is not making much sense, but below is the code to execute this in case of ideal situations.

# Drop few columns from DataFrame and select remaining
df_new = df.drop(['Designation', 'Experience'], axis=1)

print (df_new)

Output

      Name    Team
0  Shubham    Tech
1     Riti    Tech
2   Shanky     PMO
3   Shreya  Design
4     Aadi    Tech
5      Sim    Tech

Select subset of columns using assign() function

Another interesting method is to use the DataFrame.assign() method to create a new DataFrame and then assign different columns, finally returning a new DataFrame with the new columns created. Let’s look at the code below which is actually simple to understand.

# Assign few columns from a DataFrame to a new DataFrame
df_new = pd.DataFrame().assign(employee_name=df['Name'], team_name=df['Team'])

print (df_new)

Output

  employee_name team_name
0       Shubham      Tech
1          Riti      Tech
2        Shanky       PMO
3        Shreya    Design
4          Aadi      Tech
5           Sim      Tech

The real advantage of this function is that you can rename the columns as well at the same time. Here, we have renamed column “Name” as “employee_name” and “Team” as “team_name”.

The complete example is as follows,

import pandas as pd

# List of Tuples
employees= [('Shubham', 'Data Scientist', 'Tech',   5),
            ('Riti', 'Data Engineer', 'Tech' ,   7),
            ('Shanky', 'Program Manager', 'PMO' ,   2),
            ('Shreya', 'Graphic Designer', 'Design' ,   2),
            ('Aadi', 'Backend Developer', 'Tech', 11),
            ('Sim', 'Data Engineer', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'Team', 'Experience'],
                  index=[0, 1, 2, 3, 4, 5])
print(df)

# saving columns Name and Team into df_new
df_new = df[['Name', 'Team']].copy()

print (df_new)

# Select Name and Team columns from DataFrame as copy
# using filter() function
df_new = df.filter(['Name', 'Team'])

print (df_new)

# Drop few columns from DataFrame and select remaining
df_new = df.drop(['Designation', 'Experience'], axis=1)

print (df_new)

# Assign few columns from a DataFrame to a new DataFrame
df_new = pd.DataFrame().assign(employee_name=df['Name'], team_name=df['Team'])

print (df_new)

Output:

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager     PMO           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer    Tech           4

      Name    Team
0  Shubham    Tech
1     Riti    Tech
2   Shanky     PMO
3   Shreya  Design
4     Aadi    Tech
5      Sim    Tech

      Name    Team
0  Shubham    Tech
1     Riti    Tech
2   Shanky     PMO
3   Shreya  Design
4     Aadi    Tech
5      Sim    Tech

      Name    Team
0  Shubham    Tech
1     Riti    Tech
2   Shanky     PMO
3   Shreya  Design
4     Aadi    Tech
5      Sim    Tech

  employee_name team_name
0       Shubham      Tech
1          Riti      Tech
2        Shanky       PMO
3        Shreya    Design
4          Aadi      Tech
5           Sim      Tech

Summary

In this article, we have discussed multiple ways to extract specific columns to a new DataFrame as a copy in pandas. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top