In this article, we will discuss different ways to select multiple columns of dataframe by name in pandas.
Table of Contents
- Select multiple columns by name in pandas dataframe using []
- Select multiple columns by name in pandas dataframe using loc[]
Suppose we have a dataframe df with following contents,
Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12
We want to select multiple columns from this dataframe. Let’s see how to do that,
Select multiple columns of pandas dataframe using []
To select a multiple columns of a dataframe, pass a list of column names to the [] (subscript operator) of the dataframe i.e.
col_names = ['City', 'Age'] # Select multiple columns of dataframe by names in list multiple_columns = df[col_names] print(multiple_columns)
Output
City Age 0 Sydney 34 1 Delhi 31 2 London 16 3 Delhi 41
When we passed a list containing two column names in the [] operator of the dataframe, it returned a subset of dataframe as a different dataframe object with only those two columns i.e. ‘City’ and ‘Age’. Also the returned subset is a view of the dataframe. Any modifications done in this, will be reflected in the original dataframe.
Let’s checkout an example, where we will select two dataframes column name ‘City’ and ‘Age’ from the dataframe,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, 'London', 11) , ('Mark', 41, 'Delhi' , 12)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience']) print("Contents of the Dataframe : ") print(df) col_names = ['City', 'Age'] # Select multiple columns of dataframe by names in list multiple_columns = df[col_names] print("Selected Columns of Dataframe : ") print(multiple_columns)
Output:
Contents of the Dataframe : Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12 Selected Columns of Dataframe : City Age 0 Sydney 34 1 Delhi 31 2 London 16 3 Delhi 41
Select multiple columns of pandas dataframe using loc[]
We can also select multiple columns of the dataframe using its loc[] attribute. But before that let’s have a little overview of the loc[] attribute,
Overview of dataframe.loc[]
In pandas, dataframe provides an attribute loc[] to select rows or columns of a dataframe based on names. It’s syntax is as follows,
df.loc[rows_section : column_section]
Arguments:
- rows_section: It can be either of following,
- Single row index label.
- If provided then it will select that row only.
- A list / sequence of multiple row index labels.
- If provided then it will select the rows with index labels in given list.
- A range of row index labels i.e. start:end.
- If start:end is provided, then it will select rows from start to end-1.
- If “:” is provided, then it will select all rows.
- Single row index label.
- columns_section: It can be either of following,
- Single column name.
- If provided, then loc[] will select the column with given name.
- A list / sequence of multiple column names.
- If provided, then loc[] will select the columns with given names in the list.
- A range of column names i.e. start:end.
- If start:end is provided, then it will select columns from start to end-1.
- If “:” is provided, then it will select all columns.
- Single column name.
Returns:
- Based on the row & column names provided in the arguments, it returns a sub-set of the dataframe.
Example of selecting multiple columns of dataframe by name using loc[]
We can select the multiple columns of dataframe, by passing a list of column names in the columns_section of loc[] and in rows_section pass the value “:”, to select all value of these columns. For example,
col_names = ['City', 'Age'] # Select multiple columns of dataframe by name multiple_columns = df.loc[: , col_names]
Output:
City Age 0 Sydney 34 1 Delhi 31 2 London 16 3 Delhi 41
In the rows_section we passed the “:”. Whereas, in the columns_section we passed the list of column names only. Therefore it returned all the values of those columns from the dataframe as a different dataframe object. But this subset dataframe is a view of the original dataframe. Any modifications done in this, will be reflected in the original dataframe.
Complete example with to select a multiple columns of dataframe using loc[] is as follows,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, 'London', 11) , ('Mark', 41, 'Delhi' , 12)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience']) print("Contents of the Dataframe : ") print(df) col_names = ['City', 'Age'] # Select multiple columns of dataframe by name multiple_columns = df.loc[: , col_names] print("Selected Columns of Dataframe : ") print(multiple_columns)
Output:
Contents of the Dataframe : Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12 Selected Columns of Dataframe : City Age 0 Sydney 34 1 Delhi 31 2 London 16 3 Delhi 41
Summary:
We learned about two different ways to select multiple columns of dataframe.