Select multiple columns from a DataFrame in Pandas

In this article, we will discuss different ways to select multiple columns from a Pandas DataFrame in Python.

Table of Contents

We will also cover a general idea of which method should be used based on the situation. To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

Contents of the created dataframe are,

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

Now, let’s look at different ways in which we could select multiple columns from the above dataframe.

Advertisements

Select multiple columns based on column names

The first scenario is useful when you precisely know the column names you want to select from the dataframe. For example, we need to select the columns “Designation” and “Experience” from our sample dataframe.

# select "Designation" and "Experience" columns
filterDf = df[["Designation", "Experience"]]

print (filterDf)

Output

        Designation  Experience
0    Data Scientist           5
1      Data Analyst           7
2   Program Manager           2
3  Graphic Designer           2
4  Data Engineering          11

This method creates a copy of the original dataframe with the selected columns.

The complete example is as follows,

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# select "Designation" and "Experience" columns
filterDf = df[["Designation", "Experience"]]

print (filterDf)

Output

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11


        Designation  Experience
0    Data Scientist           5
1      Data Analyst           7
2   Program Manager           2
3  Graphic Designer           2
4  Data Engineering          11

Select multiple columns using loc method

In this scenario, we slice the dataframe by the column names using the loc method. For instance, we again need to select the columns “Designation” and “Experience” from our sample dataframe.

# select "Designation" and "Experience" columns using loc method
filterDf = df.loc[:, ["Designation", "Experience"]]

print (filterDf)

Output

        Designation  Experience
0    Data Scientist           5
1      Data Analyst           7
2   Program Manager           2
3  Graphic Designer           2
4  Data Engineering          11

You might notice that this approach is similar to the above method. The key difference between these two is that the loc method will return a view of the original dataframe. This means that the view (filterDf) will contain only a part of the original dataframe (df) but it will still point to the same locations in memory. Therefore, if you modify the original dataframe (df), then the filtered object may also reflect the changes (filterDf).

This method comes in handy when we want to slice and modify any specific values.

# change the designation of first row
df.loc[0, "Designation"] = "Senior Data Scientist"

print (df)

Output

      Name            Designation      City  Experience
0  Shubham  Senior Data Scientist    Sydney           5
1     Riti           Data Analyst     Delhi           7
2   Shanky        Program Manager     Delhi           2
3   Shreya       Graphic Designer    Mumbai           2
4     Aadi       Data Engineering  New York          11

However, in cases, where you want to copy the selected columns into another dataframe, it is recommended to use copy statement to avoid any changes to the parent dataframe.

# copy the selected records into a new dataframe
filterDf = df.loc[:, ["Designation", "Experience"]].copy()

The complete example is as follows,

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# select "Designation" and "Experience" columns using loc method
filterDf = df.loc[:, ["Designation", "Experience"]]

print (filterDf)

# change the designation of first row
df.loc[0, "Designation"] = "Senior Data Scientist"

print (df)

Output

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11

        Designation  Experience
0    Data Scientist           5
1      Data Analyst           7
2   Program Manager           2
3  Graphic Designer           2
4  Data Engineering          11

      Name            Designation      City  Experience
0  Shubham  Senior Data Scientist    Sydney           5
1     Riti           Data Analyst     Delhi           7
2   Shanky        Program Manager     Delhi           2
3   Shreya       Graphic Designer    Mumbai           2
4     Aadi       Data Engineering  New York          11

Select multiple columns using iloc

Using iloc for selecting multiple columns from a dataframe is quite similar to using loc statements. The only difference here being we have to use the column index instead of the column names.

# select "Designation" and "Experience" columns using loc method
filterDf = df.iloc[:, [1,3]]

print (filterDf)

Output

        Designation  Experience
0    Data Scientist           5
1      Data Analyst           7
2   Program Manager           2
3  Graphic Designer           2
4  Data Engineering          11

Here, also the output is the view of the original DataFrame, therefore, it is recommended to use “.copy” in case we want to use it in further analysis.

The complete example is as follows,

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# select "Designation" and "Experience" columns using loc method
filterDf = df.iloc[:, [1,3]]

print (filterDf)

Output:

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11


        Designation  Experience
0    Data Scientist           5
1      Data Analyst           7
2   Program Manager           2
3  Graphic Designer           2
4  Data Engineering          11

Select multiple columns based on list of columns

In this scenario, let’s say, we have a defined column list. Out of the list, a few exist in the dataframe and the remaining are missing. If we use the same indexing method (approach 1) that we discussed, it would rest into an error.

# list of columns
colList = ["Name", "City", "Contact", "Address"]

# indexing using the names
print (df[colList])

Output

KeyError: "['Address', 'Contact'] not in index"

It would result in a KeyError since the columns Address and Contact are not present in the dataframe. Here, the “isin” operator comes in handy.

# list of columns
colList = ["Name", "City", "Contact", "Address"]

# indexing using the isin operator
filterDf = df.loc[:, df.columns.isin(colList)]

print (filterDf)

Output

      Name      City
0  Shubham    Sydney
1     Riti     Delhi
2   Shanky     Delhi
3   Shreya    Mumbai
4     Aadi  New York

The output contains the columns which were present in the DataFrame, it will ignore rest of the columns present in the list.

The complete example is as follows,

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# list of columns
colList = ["Name", "City", "Contact", "Address"]

# indexing using the isin operator
filterDf = df.loc[:, df.columns.isin(colList)]

print (filterDf)

Output

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11
      Name      City
0  Shubham    Sydney
1     Riti     Delhi
2   Shanky     Delhi
3   Shreya    Mumbai
4     Aadi  New York

Select multiple columns using filter() method

The filter method to select columns is extremely useful when we need to extract the columns using a regex-based pattern.

# select using filter method
filterDf = df.filter(regex = 'Name|t')

print (filterDf)

Output

      Name            Designation      City
0  Shubham  Senior Data Scientist    Sydney
1     Riti           Data Analyst     Delhi
2   Shanky        Program Manager     Delhi
3   Shreya       Graphic Designer    Mumbai
4     Aadi       Data Engineering  New York

Here, we selected all the columns containing “Name” and “t” in their names. It is very handy when there are too many columns that can’t be viewed at once.

The complete example is as follows,

import pandas as pd

# List of Tuples
employees = [('Shubham', 'Data Scientist', 'Sydney',   5),
            ('Riti', 'Data Analyst', 'Delhi' ,   7),
            ('Shanky', 'Program Manager', 'Delhi' ,   2),
            ('Shreya', 'Graphic Designer', 'Mumbai' ,   2),
            ('Aadi', 'Data Engineering', 'New York', 11)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Designation', 'City', 'Experience'],
                  index=[0, 1, 2, 3, 4])
print(df)

# select using filter method
filterDf = df.filter(regex = 'Name|t')

print (filterDf)

Output

      Name       Designation      City  Experience
0  Shubham    Data Scientist    Sydney           5
1     Riti      Data Analyst     Delhi           7
2   Shanky   Program Manager     Delhi           2
3   Shreya  Graphic Designer    Mumbai           2
4     Aadi  Data Engineering  New York          11


      Name       Designation      City
0  Shubham    Data Scientist    Sydney
1     Riti      Data Analyst     Delhi
2   Shanky   Program Manager     Delhi
3   Shreya  Graphic Designer    Mumbai
4     Aadi  Data Engineering  New York

Summary

Great, you made it! In this article, we have discussed multiple ways to select columns from a pandas DataFrame. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top