In this article, we will discuss different ways to select multiple columns from a Pandas DataFrame in Python.
Table of Contents
We will also cover a general idea of which method should be used based on the situation. To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df)
Contents of the created dataframe are,
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11
Now, let’s look at different ways in which we could select multiple columns from the above dataframe.
Select multiple columns based on column names
The first scenario is useful when you precisely know the column names you want to select from the dataframe. For example, we need to select the columns “Designation” and “Experience” from our sample dataframe.
# select "Designation" and "Experience" columns filterDf = df[["Designation", "Experience"]] print (filterDf)
Output
Designation Experience 0 Data Scientist 5 1 Data Analyst 7 2 Program Manager 2 3 Graphic Designer 2 4 Data Engineering 11
This method creates a copy of the original dataframe with the selected columns.
The complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # select "Designation" and "Experience" columns filterDf = df[["Designation", "Experience"]] print (filterDf)
Output
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 Designation Experience 0 Data Scientist 5 1 Data Analyst 7 2 Program Manager 2 3 Graphic Designer 2 4 Data Engineering 11
Select multiple columns using loc method
In this scenario, we slice the dataframe by the column names using the loc method. For instance, we again need to select the columns “Designation” and “Experience” from our sample dataframe.
# select "Designation" and "Experience" columns using loc method filterDf = df.loc[:, ["Designation", "Experience"]] print (filterDf)
Output
Designation Experience 0 Data Scientist 5 1 Data Analyst 7 2 Program Manager 2 3 Graphic Designer 2 4 Data Engineering 11
You might notice that this approach is similar to the above method. The key difference between these two is that the loc method will return a view of the original dataframe. This means that the view (filterDf) will contain only a part of the original dataframe (df) but it will still point to the same locations in memory. Therefore, if you modify the original dataframe (df), then the filtered object may also reflect the changes (filterDf).
This method comes in handy when we want to slice and modify any specific values.
# change the designation of first row df.loc[0, "Designation"] = "Senior Data Scientist" print (df)
Output
Name Designation City Experience 0 Shubham Senior Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11
However, in cases, where you want to copy the selected columns into another dataframe, it is recommended to use copy statement to avoid any changes to the parent dataframe.
# copy the selected records into a new dataframe filterDf = df.loc[:, ["Designation", "Experience"]].copy()
The complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # select "Designation" and "Experience" columns using loc method filterDf = df.loc[:, ["Designation", "Experience"]] print (filterDf) # change the designation of first row df.loc[0, "Designation"] = "Senior Data Scientist" print (df)
Output
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 Designation Experience 0 Data Scientist 5 1 Data Analyst 7 2 Program Manager 2 3 Graphic Designer 2 4 Data Engineering 11 Name Designation City Experience 0 Shubham Senior Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11
Select multiple columns using iloc
Using iloc for selecting multiple columns from a dataframe is quite similar to using loc statements. The only difference here being we have to use the column index instead of the column names.
# select "Designation" and "Experience" columns using loc method filterDf = df.iloc[:, [1,3]] print (filterDf)
Output
Designation Experience 0 Data Scientist 5 1 Data Analyst 7 2 Program Manager 2 3 Graphic Designer 2 4 Data Engineering 11
Here, also the output is the view of the original DataFrame, therefore, it is recommended to use “.copy” in case we want to use it in further analysis.
The complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # select "Designation" and "Experience" columns using loc method filterDf = df.iloc[:, [1,3]] print (filterDf)
Output:
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 Designation Experience 0 Data Scientist 5 1 Data Analyst 7 2 Program Manager 2 3 Graphic Designer 2 4 Data Engineering 11
Select multiple columns based on list of columns
In this scenario, let’s say, we have a defined column list. Out of the list, a few exist in the dataframe and the remaining are missing. If we use the same indexing method (approach 1) that we discussed, it would rest into an error.
# list of columns colList = ["Name", "City", "Contact", "Address"] # indexing using the names print (df[colList])
Output
KeyError: "['Address', 'Contact'] not in index"
It would result in a KeyError since the columns Address and Contact are not present in the dataframe. Here, the “isin” operator comes in handy.
# list of columns colList = ["Name", "City", "Contact", "Address"] # indexing using the isin operator filterDf = df.loc[:, df.columns.isin(colList)] print (filterDf)
Output
Name City 0 Shubham Sydney 1 Riti Delhi 2 Shanky Delhi 3 Shreya Mumbai 4 Aadi New York
The output contains the columns which were present in the DataFrame, it will ignore rest of the columns present in the list.
The complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # list of columns colList = ["Name", "City", "Contact", "Address"] # indexing using the isin operator filterDf = df.loc[:, df.columns.isin(colList)] print (filterDf)
Output
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 Name City 0 Shubham Sydney 1 Riti Delhi 2 Shanky Delhi 3 Shreya Mumbai 4 Aadi New York
Select multiple columns using filter() method
The filter method to select columns is extremely useful when we need to extract the columns using a regex-based pattern.
# select using filter method filterDf = df.filter(regex = 'Name|t') print (filterDf)
Output
Name Designation City 0 Shubham Senior Data Scientist Sydney 1 Riti Data Analyst Delhi 2 Shanky Program Manager Delhi 3 Shreya Graphic Designer Mumbai 4 Aadi Data Engineering New York
Here, we selected all the columns containing “Name” and “t” in their names. It is very handy when there are too many columns that can’t be viewed at once.
The complete example is as follows,
import pandas as pd # List of Tuples employees = [('Shubham', 'Data Scientist', 'Sydney', 5), ('Riti', 'Data Analyst', 'Delhi' , 7), ('Shanky', 'Program Manager', 'Delhi' , 2), ('Shreya', 'Graphic Designer', 'Mumbai' , 2), ('Aadi', 'Data Engineering', 'New York', 11)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Designation', 'City', 'Experience'], index=[0, 1, 2, 3, 4]) print(df) # select using filter method filterDf = df.filter(regex = 'Name|t') print (filterDf)
Output
Name Designation City Experience 0 Shubham Data Scientist Sydney 5 1 Riti Data Analyst Delhi 7 2 Shanky Program Manager Delhi 2 3 Shreya Graphic Designer Mumbai 2 4 Aadi Data Engineering New York 11 Name Designation City 0 Shubham Data Scientist Sydney 1 Riti Data Analyst Delhi 2 Shanky Program Manager Delhi 3 Shreya Graphic Designer Mumbai 4 Aadi Data Engineering New York
Summary
Great, you made it! In this article, we have discussed multiple ways to select columns from a pandas DataFrame. Thanks.