In this article, we will discuss how to select a Dataframe column by name in pandas.
Table of Contents
Suppose we have a dataframe df with following contents,
Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12
We want to select one column from this dataframe by name. Lets see how to do that,
Pandas – Select Dataframe Column by Name using []
To select a single columns from a dataframe, pass the column name to the [] operator i.e. subscript operator of the dataframe i.e.
# Select single dataframe column by name col = df['Age'] print(col)
Output:
0 34 1 31 2 16 3 41 Name: Age, dtype: int64
It will return the column ‘Age’ of the dataframe (df) as a series object.
Let’s checkout an example, where we will select a dataframe column name ‘Age’,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, 'London', 11) , ('Mark', 41, 'Delhi' , 12)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience']) print("Contents of the Dataframe : ") print(df) # Select single dataframe column by name col = df['Age'] print("Selected column 'Age' of Dataframe : ") print(col) print('Type of Column: ', type(col))
Output:
Contents of the Dataframe : Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12 Selected column 'Age' of Dataframe : 0 34 1 31 2 16 3 41 Name: Age, dtype: int64 Type of Column: <class 'pandas.core.series.Series'>
We selected the column with name ‘Age’ from dataframe and also confirmed that its data type is Series. Moreover, the series object containing the selected column is a view of the dataframe, any modifications done in this column will be reflected in the original dataframe.
Pandas – Select Dataframe Column by Name using loc[]
We can also select single column of the dataframe using its loc[] attribute. But before that let’s have a little overview of the loc[] attribute,
Overview of dataframe.loc[]
In pandas, dataframe provides an attribute loc[] to select rows or columns of a dataframe based on names. It’s syntax is as follows,
df.loc[rows_section : column_section]
Arguments:
- rows_section: It can be either of following,
- Single row index label.
- If provided then it will select that row only.
- A list / sequence of multiple row index labels.
- If provided then it will select the rows with index labels in given list.
- A range of row index labels i.e. start:end.
- If start:end is provided, then it will select rows from start to end-1.
- If “:” is provided, then it will select all rows.
- Single row index label.
- columns_section: It can be either of following,
- Single column name.
- If provided, then loc[] will select the column with given name.
- A list / sequence of multiple column names.
- If provided, then loc[] will select the columns with given names in the list.
- A range of column names i.e. start:end.
- If start:end is provided, then it will select columns from start to end-1.
- If “:” is provided, then it will select all columns.
- Single column name.
Returns:
- Based on the row & column names provided in the arguments, it returns a sub-set of the dataframe.
Example of selecting a Dataframe Column by name using loc[]
We can select the single column of dataframe, by passing the column name in the columns_section of loc[] and in rows_section pass the value “:”, to select all value of the column. For example,
# Select column 'Age' of the dataframe col = df.loc[:, 'Age'] print(col)
Output:
0 34 1 31 2 16 3 41 Name: Age, dtype: int64
It will return the column ‘Age’ of dataframe as a series object. In the rows_section we passed the “:”. Whereas, in the columns_section we passed the column name only i.e. ‘Age’. Therefore it returned all the values of single column ‘Age’ from the dataframe as a series object.
Complete example to select a single column of dataframe using loc[] is as follows,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, 'London', 11) , ('Mark', 41, 'Delhi' , 12)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience']) print("Contents of the Dataframe : ") print(df) column_name = 'Age' # Select column 'Age' of the dataframe col = df.loc[:, column_name] print("Selected column 'Age' of Dataframe : ") print(col) print('Type: ', type(col))
Output:
Contents of the Dataframe : Name Age City Experience 0 Jack 34 Sydney 5 1 Riti 31 Delhi 7 2 Aadi 16 London 11 3 Mark 41 Delhi 12 Selected column 'Age' of Dataframe : 0 34 1 31 2 16 3 41 Name: Age, dtype: int64 Type: <class 'pandas.core.series.Series'>
Summary:
We learned about two different ways to select one column of dataframe.
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.