In this article we will discuss different ways to select rows and columns in DataFrame.
DataFrame provides indexing labels loc & iloc for accessing the column and rows. Also, operator [] can be used to select columns. Let’s discuss them one by one,
First create a DataFrame object i.e.
students = [ ('jack', 34, 'Sydeny') , ('Riti', 30, 'Delhi' ) , ('Aadi', 16, 'New York') ] # Create a DataFrame object dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City'], index=['a', 'b', 'c'])
Contents of DataFrame object dfObj are,
Frequently Asked:
Original DataFrame : Name Age City a jack 34 Sydeny b Riti 30 Delhi c Aadi 16 New York
DataFrame.loc | Select Column & Rows by Name
DataFrame provides indexing label loc for selecting columns and rows by names i.e.
dataFrame.loc[<ROWS RANGE> , <COLUMNS RANGE>]
It selects the specified columns and rows from the given DataFrame.
ROWS OR COLUMN RANGE can be also be ‘:’Â and if given in rows or column Range parameter then the all entries will be included for corresponding row or column.
Let’s see how to use it,
Select a Column by Name in DataFrame using loc[]
As we want selection on column only, it means all rows should be included for selected column i.e.
Latest Python - Video Tutorial
''' Selecting a Single Column by Column Names ''' columnsData = dfObj.loc[ : , 'Age' ]
It will return a Series object with same indexes as DataFrame.
a 34 b 30 c 16 Name: Age, dtype: int64
Select multiple Columns by Name in DataFrame using loc[]
Pass column names as list,
# Select only 2 columns from dataFrame and create a new subset DataFrame columnsData = dfObj.loc[ : , ['Age', 'Name'] ]
It will return a subset DataFrame with same indexes but selected columns only i.e.
Age Name a 34 jack b 30 Riti c 16 Aadi
Select a single row by Index Label in DataFrame using loc[]
Now we will pass argument ‘:’ in Column range of loc, so that all columns should be included. But for Row Indexes we will pass a label only,
rowData = dfObj.loc[ 'b' , : ]
It will return a series object with same indexes equal to DataFrame columns names i.e.
Name Riti Age 30 City Delhi Name: b, dtype: object
Select multiple rows by Index labels in DataFrame using loc[]
Pass row index labels as list,
rowData = dfObj.loc[ ['c' , 'b'] , : ]
It will return a subset DataFrame with same columns as DataFrame but selected indexes only i.e.
Name Age City c Aadi 16 New York b Riti 30 Delhi
Only Rows with index label ‘b’ & ‘c’ are in returned DataFrame object.
Select multiple row & columns by Labels in DataFrame using loc[]
To select multiple rows & column, pass lists containing index labels and column names i.e.
subset = dfObj.loc[ ['c' , 'b'] ,['Age', 'Name'] ]
It will return a subset DataFrame with given rows and columns i.e.
Age Name c 16 Aadi b 30 Riti
Only Rows with index label ‘b’ & ‘c’ and Columns with names ‘Age’, ‘Name’ are in returned DataFrame object.
Instead of passing all the names in index or column list we can pass range also i.e.
subset = dfObj.loc[ 'a' : 'c' ,'Age' : 'City' ]
It will return a subset DataFrame with rows from a to c & columns from Age to City i.e.
Age City a 34 Sydeny b 30 Delhi c 16 New York
DataFrame.iloc | Select Column Indexes & Rows Index Positions
DataFrame provides indexing label iloc for accessing the column and rows by index positions i.e.
dataFrame.iloc[<ROWS INDEX RANGE> , <COLUMNS INDEX RANGE>]
It selects the columns and rows from DataFrame by index position specified in range. If ‘:’ is given in rows or column Index Range then all entries will be included for corresponding row or column.
Let’s see how to use it.
Our DataFrame object dfObj is,
Name Age City a jack 34 Sydeny b Riti 30 Delhi c Aadi 16 New York
Select a single column by Index position
Select column at index 2 i.e.
dfObj.iloc[ : , 2 ]
It will return a Series object i.e,
a Sydeny b Delhi c New York Name: City, dtype: object
Select multiple columns by Index range
Select columns in column index range [0 to 2),
dfObj.iloc[: , [0, 2]]
It will return a DataFrame object i.e,
Name Age a jack 34 b Riti 30 c Aadi 16
Select multiple columns by Indexes in a list
Select columns at column index 0 and 2,
dfObj.iloc[: , [0, 2]]
It will return a DataFrame object i.e,
Name City a jack Sydeny b Riti Delhi c Aadi New York
Select single row by Index Position
Select row at index 2 i.e.
dfObj.iloc[ 1 , : ]
It will return a Series object i.e,
Name Riti Age 30 City Delhi Name: b, dtype: object
Select multiple rows by Index range
Select rows in row index range 0 to 2,
dfObj.iloc[ 0:2 , : ]
It will return a DataFrame object i.e,
Name Age City a jack 34 Sydeny b Riti 30 Delhi
Select multiple rows by Index positions in a list
Select rows in row index range 0 to 2,
dfObj.iloc[ 0:2 , : ]
It will return a DataFrame object i.e,
Name Age City c Aadi 16 New York a jack 34 Sydeny
Select multiple rows & columns by Index positions
Select rows at row index 0 and 2,
dfObj.iloc[[2 ,0 ] , : ]
It will return a DataFrame object i.e,
Age City a 34 Sydeny c 16 New York
Select multiple rows & columns by Index positions
Select rows at index 0 & 2 . Also columns at row 1 and 2,
dfObj.iloc[[0 , 2] , [1 , 2] ]
It will return following DataFrame object,
Age City a 34 Sydeny c 16 New York
Select multiple rows & columns by Indexes in a range
Select rows at index 0 to 2 (2nd index not included) . Also columns at row 0 to 2 (2nd index not included),
dfObj.iloc[ 0 : 2 , 1 : 2 ]
It will return following DataFrame object,
Age a 34 b 30
If we try to select an index out of range then it will IndexError.
Selecting Columns in DataFrame using [] operator
To access a single or multiple columns from DataFrame by name we can use dictionary like notation on DataFrame i.e.
Select a Column by Name
column2 = dfObj['Age']
It will return a Series object with same indexes as dataFrame i.e.
a 34 b 30 c 16 Name: Age, dtype: int64
Select multiple columns by Name
Instead of passing a single name in [] we can pass a list of column names i.e.
column2 = dfObj[ ['Age', 'Name'] ]
It will return a DataFrame object containing only specified columns from given DataFrame object i.e.
Age Name a 34 jack b 30 Riti c 16 Aadi
On accessing a column name that doesn’t exists it will throw ‘KeyError‘.
Complete example is as follows,
import pandas as pd def main(): students = [ ('jack', 34, 'Sydeny') , ('Riti', 30, 'Delhi' ) , ('Aadi', 16, 'New York') ] # Create a DataFrame object dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City'], index=['a', 'b', 'c']) print("Original DataFrame : ", dfObj, sep="\n") print("***** Select Columns in DataFrame by [] *********") ''' Select a Column by Name using [] ''' column2 = dfObj['Age'] print("Select column By Name using [] " , column2 , sep='\n') print("Type : " , type(column2)) ''' Select Multiple Column by Name using [] ''' column2 = dfObj[ ['Age', 'Name'] ] print("Select multiple columns By Name using [] " , column2 , sep='\n') print("Type : " , type(column2)) print("**** Selecting by Column Names & Rows Index Labels Using df.loc ******") ''' Selecting a Single Column by Column Names ''' columnsData = dfObj.loc[ : , 'Age' ] print("Select a column By Name using loc " , columnsData , sep='\n') print("Type : " , type(columnsData)) ''' Selecting multiple Columns by Column Names ''' # Select only 2 columns from dataFrame and create a new subset DataFrame columnsData = dfObj.loc[ : , ['Age', 'Name'] ] print("Select multiple columns By Name using loc " , columnsData , sep='\n') print("Type : " , type(columnsData)) ''' Selecting a Single Row by Index label ''' rowData = dfObj.loc[ 'b' , : ] print("Select a Single Row " , rowData , sep='\n') print("Type : " , type(rowData)) ''' Selecting multiple Rows by Index labels ''' rowData = dfObj.loc[ ['c' , 'b'] , : ] print("Select multiple Rows" , rowData , sep='\n') ''' Select both Rows & Columns by Index labels ''' subset = dfObj.loc[ ['c' , 'b'] ,['Age', 'Name'] ] print("Select both columns & Rows" , subset , sep='\n') subset = dfObj.loc[ 'a' : 'c' ,'Age' : 'City' ] print("Select both columns & Rows with selection range " , subset , sep='\n') print("**** Selecting by Column Indexes & Rows Index Positions Using df.iloc ******") ''' Select a single column by Index Position ''' print(" Select column at index 2 ") print( dfObj.iloc[ : , 2 ] ) ''' Select multiple columns by Index range ''' print(" Select columns in column index range 0 to 2") print(dfObj.iloc[:, 0:2]) ''' Select multiple columns by Indexes in a list ''' print(" Select columns at column index 0 and 2") print(dfObj.iloc[: , [0, 2]]) ''' Select single row by Index Position ''' print(" Select row at index 2 ") print( dfObj.iloc[ 1 , : ] ) ''' Select multiple rows by Index range ''' print(" Select rows in row index range 0 to 2") print(dfObj.iloc[ 0:2 , : ]) ''' Select multiple rows by Index positions in a list ''' print(" Select rows at row index 0 and 2") print(dfObj.iloc[[2 ,0 ] , : ]) ''' Select multiple rows & columns by Index positions ''' print(" Select rows at index 0 & 2 . Also columns at row 1 and 2") print(dfObj.iloc[[0 , 2] , [1 , 2] ]) ''' Select multiple rows & columns by Indexes in a range ''' print(" Select rows at index 0 to 2 (2nd index not included) . Also columns at row 0 to 2 (2nd index not included)") print(dfObj.iloc[ 0 : 2 , 1 : 2 ]) if __name__ == '__main__': main()
Output:
Original DataFrame : Name Age City a jack 34 Sydeny b Riti 30 Delhi c Aadi 16 New York ***** Select Columns in DataFrame by [] ********* Select column By Name using [] a 34 b 30 c 16 Name: Age, dtype: int64 Type : <class 'pandas.core.series.Series'> Select multiple columns By Name using [] Age Name a 34 jack b 30 Riti c 16 Aadi Type : <class 'pandas.core.frame.DataFrame'> **** Selecting by Column Names & Rows Index Labels Using df.loc ****** Select a column By Name using loc a 34 b 30 c 16 Name: Age, dtype: int64 Type : <class 'pandas.core.series.Series'> Select multiple columns By Name using loc Age Name a 34 jack b 30 Riti c 16 Aadi Type : <class 'pandas.core.frame.DataFrame'> Select a Single Row Name Riti Age 30 City Delhi Name: b, dtype: object Type : <class 'pandas.core.series.Series'> Select multiple Rows Name Age City c Aadi 16 New York b Riti 30 Delhi Select both columns & Rows Age Name c 16 Aadi b 30 Riti Select both columns & Rows with selection range Age City a 34 Sydeny b 30 Delhi c 16 New York **** Selecting by Column Indexes & Rows Index Positions Using df.iloc ****** Select column at index 2 a Sydeny b Delhi c New York Name: City, dtype: object Select columns in column index range 0 to 2 Name Age a jack 34 b Riti 30 c Aadi 16 Select columns at column index 0 and 2 Name City a jack Sydeny b Riti Delhi c Aadi New York Select row at index 2 Name Riti Age 30 City Delhi Name: b, dtype: object Select rows in row index range 0 to 2 Name Age City a jack 34 Sydeny b Riti 30 Delhi Select rows at row index 0 and 2 Name Age City c Aadi 16 New York a jack 34 Sydeny Select rows at index 0 & 2 . Also columns at row 1 and 2 Age City a 34 Sydeny c 16 New York Select rows at index 0 to 2 (2nd index not included) . Also columns at row 0 to 2 (2nd index not included) Age a 34 b 30
This is great! thank you sir!!