In this article, we will discuss how to select a Dataframe column by name in pandas.
Table of Contents
Suppose we have a dataframe df with following contents,
Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12
We want to select one column from this dataframe by name. Lets see how to do that,
Pandas – Select Dataframe Column by Name using []
To select a single columns from a dataframe, pass the column name to the [] operator i.e. subscript operator of the dataframe i.e.
# Select single dataframe column by name col = df['Age'] print(col)
Output:
0 34 1 31 2 16 3 41 Name: Age, dtype: int64
It will return the column ‘Age’ of the dataframe (df) as a series object.
Let’s checkout an example, where we will select a dataframe column name ‘Age’,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, 'London', 11) , ('Mark', 41, 'Delhi' , 12)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience']) print("Contents of the Dataframe : ") print(df) # Select single dataframe column by name col = df['Age'] print("Selected column 'Age' of Dataframe : ") print(col) print('Type of Column: ', type(col))
Output:
Contents of the Dataframe : Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12 Selected column 'Age' of Dataframe : 0 34 1 31 2 16 3 41 Name: Age, dtype: int64 Type of Column: <class 'pandas.core.series.Series'>
We selected the column with name ‘Age’ from dataframe and also confirmed that its data type is Series. Moreover, the series object containing the selected column is a view of the dataframe, any modifications done in this column will be reflected in the original dataframe.
Pandas – Select Dataframe Column by Name using loc[]
We can also select single column of the dataframe using its loc[] attribute. But before that let’s have a little overview of the loc[] attribute,
Overview of dataframe.loc[]
In pandas, dataframe provides an attribute loc[] to select rows or columns of a dataframe based on names. It’s syntax is as follows,
df.loc[rows_section : column_section]
Arguments:
- rows_section: It can be either of following,
- Single row index label.
- If provided then it will select that row only.
- A list / sequence of multiple row index labels.
- If provided then it will select the rows with index labels in given list.
- A range of row index labels i.e. start:end.
- If start:end is provided, then it will select rows from start to end-1.
- If “:” is provided, then it will select all rows.
- Single row index label.
- columns_section: It can be either of following,
- Single column name.
- If provided, then loc[] will select the column with given name.
- A list / sequence of multiple column names.
- If provided, then loc[] will select the columns with given names in the list.
- A range of column names i.e. start:end.
- If start:end is provided, then it will select columns from start to end-1.
- If “:” is provided, then it will select all columns.
- Single column name.
Returns:
- Based on the row & column names provided in the arguments, it returns a sub-set of the dataframe.
Example of selecting a Dataframe Column by name using loc[]
We can select the single column of dataframe, by passing the column name in the columns_section of loc[] and in rows_section pass the value “:”, to select all value of the column. For example,
# Select column 'Age' of the dataframe col = df.loc[:, 'Age'] print(col)
Output:
0 34 1 31 2 16 3 41 Name: Age, dtype: int64
It will return the column ‘Age’ of dataframe as a series object. In the rows_section we passed the “:”. Whereas, in the columns_section we passed the column name only i.e. ‘Age’. Therefore it returned all the values of single column ‘Age’ from the dataframe as a series object.
Complete example to select a single column of dataframe using loc[] is as follows,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, 'London', 11) , ('Mark', 41, 'Delhi' , 12)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience']) print("Contents of the Dataframe : ") print(df) column_name = 'Age' # Select column 'Age' of the dataframe col = df.loc[:, column_name] print("Selected column 'Age' of Dataframe : ") print(col) print('Type: ', type(col))
Output:
Contents of the Dataframe : Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12 Selected column 'Age' of Dataframe : 0 34 1 31 2 16 3 41 Name: Age, dtype: int64 Type: <class 'pandas.core.series.Series'>
Summary:
We learned about two different ways to select one column of dataframe.