In this article, we will discuss different ways to select multiple columns of dataframe by name in pandas.
Table of Contents
- Select multiple columns by name in pandas dataframe using []
- Select multiple columns by name in pandas dataframe using loc[]
Suppose we have a dataframe df with following contents,
Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12
We want to select multiple columns from this dataframe. Let’s see how to do that,
Select multiple columns of pandas dataframe using []
To select a multiple columns of a dataframe, pass a list of column names to the [] (subscript operator) of the dataframe i.e.
col_names = ['City', 'Age'] # Select multiple columns of dataframe by names in list multiple_columns = df[col_names] print(multiple_columns)
Output
City Age 0 Sydney 34 1 Delhi 31 2 London 16 3 Delhi 41
When we passed a list containing two column names in the [] operator of the dataframe, it returned a subset of dataframe as a different dataframe object with only those two columns i.e. ‘City’ and ‘Age’. Also the returned subset is a view of the dataframe. Any modifications done in this, will be reflected in the original dataframe.
Frequently Asked:
Let’s checkout an example, where we will select two dataframes column name ‘City’ and ‘Age’ from the dataframe,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, 'London', 11) , ('Mark', 41, 'Delhi' , 12)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience']) print("Contents of the Dataframe : ") print(df) col_names = ['City', 'Age'] # Select multiple columns of dataframe by names in list multiple_columns = df[col_names] print("Selected Columns of Dataframe : ") print(multiple_columns)
Output:
Contents of the Dataframe : Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12 Selected Columns of Dataframe : City Age 0 Sydney 34 1 Delhi 31 2 London 16 3 Delhi 41
Select multiple columns of pandas dataframe using loc[]
We can also select multiple columns of the dataframe using its loc[] attribute. But before that let’s have a little overview of the loc[] attribute,
Overview of dataframe.loc[]
In pandas, dataframe provides an attribute loc[] to select rows or columns of a dataframe based on names. It’s syntax is as follows,
df.loc[rows_section : column_section]
Arguments:
- rows_section: It can be either of following,
- Single row index label.
- If provided then it will select that row only.
- A list / sequence of multiple row index labels.
- If provided then it will select the rows with index labels in given list.
- A range of row index labels i.e. start:end.
- If start:end is provided, then it will select rows from start to end-1.
- If “:” is provided, then it will select all rows.
- Single row index label.
- columns_section: It can be either of following,
- Single column name.
- If provided, then loc[] will select the column with given name.
- A list / sequence of multiple column names.
- If provided, then loc[] will select the columns with given names in the list.
- A range of column names i.e. start:end.
- If start:end is provided, then it will select columns from start to end-1.
- If “:” is provided, then it will select all columns.
- Single column name.
Returns:
- Based on the row & column names provided in the arguments, it returns a sub-set of the dataframe.
Example of selecting multiple columns of dataframe by name using loc[]
We can select the multiple columns of dataframe, by passing a list of column names in the columns_section of loc[] and in rows_section pass the value “:”, to select all value of these columns. For example,
col_names = ['City', 'Age'] # Select multiple columns of dataframe by name multiple_columns = df.loc[: , col_names]
Output:
City Age 0 Sydney 34 1 Delhi 31 2 London 16 3 Delhi 41
In the rows_section we passed the “:”. Whereas, in the columns_section we passed the list of column names only. Therefore it returned all the values of those columns from the dataframe as a different dataframe object. But this subset dataframe is a view of the original dataframe. Any modifications done in this, will be reflected in the original dataframe.
Complete example with to select a multiple columns of dataframe using loc[] is as follows,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, 'London', 11) , ('Mark', 41, 'Delhi' , 12)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience']) print("Contents of the Dataframe : ") print(df) col_names = ['City', 'Age'] # Select multiple columns of dataframe by name multiple_columns = df.loc[: , col_names] print("Selected Columns of Dataframe : ") print(multiple_columns)
Output:
Contents of the Dataframe : Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12 Selected Columns of Dataframe : City Age 0 Sydney 34 1 Delhi 31 2 London 16 3 Delhi 41
Summary:
We learned about two different ways to select multiple columns of dataframe.