Pandas – Select Column by Name

In this article, we will discuss how to select a Dataframe column by name in pandas.

Table of Contents

Suppose we have a dataframe df with following contents,

   Name  Age    City  Experience
0  Jack   34  Sydney           5
1  Riti   31   Delhi           7
2  Aadi   16  London          11
3  Mark   41   Delhi          12

We want to select one column from this dataframe by name. Lets see how to do that,

Pandas – Select Dataframe Column by Name using []

To select a single columns from a dataframe, pass the column name to the [] operator i.e. subscript operator of the dataframe i.e.

# Select single dataframe column by name
col = df['Age']

print(col)

Output:

0    34
1    31
2    16
3    41
Name: Age, dtype: int64

It will return the column ‘Age’ of the dataframe (df) as a series object.

Let’s checkout an example, where we will select a dataframe column name ‘Age’,

import pandas as pd

# List of Tuples
empoyees = [('Jack',    34, 'Sydney',   5) ,
            ('Riti',    31, 'Delhi' ,   7) ,
            ('Aadi',    16, 'London',   11) ,
            ('Mark',    41, 'Delhi' ,   12)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees, 
                    columns=['Name', 'Age', 'City', 'Experience'])

print("Contents of the Dataframe : ")
print(df)

# Select single dataframe column by name
col = df['Age']

print("Selected column 'Age' of Dataframe : ")
print(col)

print('Type of Column: ', type(col))

Output:

Contents of the Dataframe : 
   Name  Age    City  Experience
0  Jack   34  Sydney           5
1  Riti   31   Delhi           7
2  Aadi   16  London          11
3  Mark   41   Delhi          12

Selected column 'Age' of Dataframe : 
0    34
1    31
2    16
3    41
Name: Age, dtype: int64
Type of Column:  <class 'pandas.core.series.Series'>

We selected the column with name ‘Age’ from dataframe and also confirmed that its data type is Series. Moreover, the series object containing the selected column is a view of the dataframe, any modifications done in this column will be reflected in the original dataframe.

Pandas – Select Dataframe Column by Name using loc[]

We can also select single column of the dataframe using its loc[] attribute. But before that let’s have a little overview of the loc[] attribute,

Overview of dataframe.loc[]

In pandas, dataframe provides an attribute loc[] to select rows or columns of a dataframe based on names. It’s syntax is as follows,

df.loc[rows_section : column_section]

Arguments:

  • rows_section: It can be either of following,
    • Single row index label.
      • If provided then it will select that row only.
    • A list / sequence of multiple row index labels.
      • If provided then it will select the rows with index labels in given list.
    • A range of row index labels i.e. start:end.
      • If start:end is provided, then it will select rows from start to end-1.
      • If “:” is provided, then it will select all rows.
  • columns_section: It can be either of following,
    • Single column name.
      • If provided, then loc[] will select the column with given name.
    • A list / sequence of multiple column names.
      • If provided, then loc[] will select the columns with given names in the list.
    • A range of column names i.e. start:end.
      • If start:end is provided, then it will select columns from start to end-1.
      • If “:” is provided, then it will select all columns.

Returns:

  • Based on the row & column names provided in the arguments, it returns a sub-set of the dataframe.

Example of selecting a Dataframe Column by name using loc[]

We can select the single column of dataframe, by passing the column name in the columns_section of loc[] and in rows_section pass the value “:”, to select all value of the column. For example,

# Select column 'Age' of the dataframe
col = df.loc[:, 'Age']

print(col)

Output:

0    34
1    31
2    16
3    41
Name: Age, dtype: int64

It will return the column ‘Age’ of dataframe as a series object. In the rows_section we passed the “:”. Whereas, in the columns_section we passed the column name only i.e. ‘Age’. Therefore it returned all the values of single column ‘Age’ from the dataframe as a series object.

Complete example to select a single column of dataframe using loc[] is as follows,

import pandas as pd

# List of Tuples
empoyees = [('Jack',    34, 'Sydney',   5) ,
            ('Riti',    31, 'Delhi' ,   7) ,
            ('Aadi',    16, 'London',   11) ,
            ('Mark',    41, 'Delhi' ,   12)]


# Create a DataFrame object
df = pd.DataFrame(  empoyees, 
                    columns=['Name', 'Age', 'City', 'Experience'])

print("Contents of the Dataframe : ")
print(df)

column_name = 'Age'

# Select column 'Age' of the dataframe
col = df.loc[:, column_name]

print("Selected column 'Age' of Dataframe : ")
print(col)

print('Type: ', type(col))

Output:

Contents of the Dataframe :
   Name  Age    City  Experience
0  Jack   34  Sydney           5
1  Riti   31   Delhi           7
2  Aadi   16  London          11
3  Mark   41   Delhi          12

Selected column 'Age' of Dataframe :
0    34
1    31
2    16
3    41
Name: Age, dtype: int64
Type:  <class 'pandas.core.series.Series'>

Summary:

We learned about two different ways to select one column of dataframe.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top