In this article, we will discuss different ways to select multiple columns of dataframe by name in pandas.

Table of Contents

Suppose we have a dataframe df with following contents,

   Name  Age    City  Experience
0  Jack   34  Sydney           5
1  Riti   31   Delhi           7
2  Aadi   16  London          11
3  Mark   41   Delhi          12

We want to select multiple columns from this dataframe. Let’s see how to do that,

Select multiple columns of pandas dataframe using []

To select a multiple columns of a dataframe, pass a list of column names to the [] (subscript operator) of the dataframe i.e.

col_names = ['City', 'Age']

# Select multiple columns of dataframe by names in list
multiple_columns = df[col_names]

print(multiple_columns)

Output

     City  Age
0  Sydney   34
1   Delhi   31
2  London   16
3   Delhi   41

When we passed a list containing two column names in the [] operator of the dataframe, it returned a subset of dataframe as a different dataframe object with only those two columns i.e. ‘City’ and ‘Age’. Also the returned subset is a view of the dataframe. Any modifications done in this, will be reflected in the original dataframe.

Let’s checkout an example, where we will select two dataframes column name ‘City’ and ‘Age’ from the dataframe,

import pandas as pd

# List of Tuples
empoyees = [('Jack',    34, 'Sydney',   5) ,
            ('Riti',    31, 'Delhi' ,   7) ,
            ('Aadi',    16, 'London',   11) ,
            ('Mark',    41, 'Delhi' ,   12)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees, 
                    columns=['Name', 'Age', 'City', 'Experience'])

print("Contents of the Dataframe : ")
print(df)

col_names = ['City', 'Age']

# Select multiple columns of dataframe by names in list
multiple_columns = df[col_names]

print("Selected Columns of Dataframe : ")
print(multiple_columns)

Output:

Contents of the Dataframe : 
   Name  Age    City  Experience
0  Jack   34  Sydney           5
1  Riti   31   Delhi           7
2  Aadi   16  London          11
3  Mark   41   Delhi          12

Selected Columns of Dataframe : 
     City  Age
0  Sydney   34
1   Delhi   31
2  London   16
3   Delhi   41

Select multiple columns of pandas dataframe using loc[]

We can also select multiple columns of the dataframe using its loc[] attribute. But before that let’s have a little overview of the loc[] attribute,

Overview of dataframe.loc[]

In pandas, dataframe provides an attribute loc[] to select rows or columns of a dataframe based on names. It’s syntax is as follows,

df.loc[rows_section : column_section]

Arguments:

  • rows_section: It can be either of following,
    • Single row index label.
      • If provided then it will select that row only.
    • A list / sequence of multiple row index labels.
      • If provided then it will select the rows with index labels in given list.
    • A range of row index labels i.e. start:end.
      • If start:end is provided, then it will select rows from start to end-1.
      • If “:” is provided, then it will select all rows.
  • columns_section: It can be either of following,
    • Single column name.
      • If provided, then loc[] will select the column with given name.
    • A list / sequence of multiple column names.
      • If provided, then loc[] will select the columns with given names in the list.
    • A range of column names i.e. start:end.
      • If start:end is provided, then it will select columns from start to end-1.
      • If “:” is provided, then it will select all columns.

Returns:

  • Based on the row & column names provided in the arguments, it returns a sub-set of the dataframe.

Example of selecting multiple columns of dataframe by name using loc[]

We can select the multiple columns of dataframe, by passing a list of column names in the columns_section of loc[] and in rows_section pass the value “:”, to select all value of these columns. For example,

col_names = ['City', 'Age']
# Select multiple columns of dataframe by name
multiple_columns = df.loc[: , col_names]

Output:

     City  Age
0  Sydney   34
1   Delhi   31
2  London   16
3   Delhi   41

In the rows_section we passed the “:”. Whereas, in the columns_section we passed the list of column names only. Therefore it returned all the values of those columns from the dataframe as a different dataframe object. But this subset dataframe is a view of the original dataframe. Any modifications done in this, will be reflected in the original dataframe.

Complete example with to select a multiple columns of dataframe using loc[] is as follows,

import pandas as pd

# List of Tuples
empoyees = [('Jack',    34, 'Sydney',   5) ,
            ('Riti',    31, 'Delhi' ,   7) ,
            ('Aadi',    16, 'London',   11) ,
            ('Mark',    41, 'Delhi' ,   12)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees, 
                    columns=['Name', 'Age', 'City', 'Experience'])

print("Contents of the Dataframe : ")
print(df)

col_names = ['City', 'Age']
# Select multiple columns of dataframe by name
multiple_columns = df.loc[: , col_names]

print("Selected Columns of Dataframe : ")
print(multiple_columns)

Output:

Contents of the Dataframe : 
   Name  Age    City  Experience
0  Jack   34  Sydney           5
1  Riti   31   Delhi           7
2  Aadi   16  London          11
3  Mark   41   Delhi          12

Selected Columns of Dataframe : 
     City  Age
0  Sydney   34
1   Delhi   31
2  London   16
3   Delhi   41

Summary:

We learned about two different ways to select multiple columns of dataframe.