Pandas Tutorial #7 - DataFrame.loc[]

In this tutorial, we will discuss how to use the loc property of the Dataframe and select rows, columns, or a subset of DataFrame based on the label names. Then we will also discuss the way to change the selected values.

DataFrame.loc[]

In Pandas, the Dataframe provides a property loc[], to select the subset of Dataframe based on row and column names/labels. We can choose single or multiple rows & columns using it. Let’s learn more about it,

Syntax:

Dataframe.loc[row_segment , column_segment]
Dataframe.loc[row_segment]

The column_segment argument is optional. Therefore, if column_segment is not provided, loc [] will select the subset of Dataframe based on row_segment argument only.

Arguments:

row_segement:
- It contains information about the rows to be selected. Its value can be,
  - A single label like ‘A’ or 7 etc.
    - In this case, it selects the single row with given label name.
    - For example, if ‘B’ only is given, then only the row with label ‘B’ is selected from Dataframe.
  - A list/array of label names like, [‘B’, ‘E’, ‘H’]
    - In this case, multiple rows will be selected based on row labels given in the list.
    - For example, if [‘B’, ‘E’, ‘H’] is given as argument in row segment, then the rows with label name ‘B’, ‘E’ and ‘H’ will be selected.
  - A slice object with ints like -> a:e .
    - This case will select multiple rows i.e. from row with label a to one before the row with label e.
    - For example, if ‘B’:’E’ is provided in the row segment of loc[], it will select a range of rows from label ‘B’ to one before label ‘E’
    - For selecting all rows, provide the value ( : )
  - A boolean sequence of same size as number of rows.
    - In this case, it will select only those rows for which the corresponding value in boolean array/list is True.
  - A callable function :
    - It can be a lambda function or general function, which accepts the calling dataframe as an argument and returns valid label names in any one of the formats mentioned above.

column_segement:
- It is optional.
- It contains information about the columns to be selected. Its value can be,
  - A single label like ‘A’ or 7 etc.
    - In this case, it selects the single column with given label name.
    - For example, if ‘Age’ only is given, then only the column with label ‘Age’ is selected from Dataframe.
  - A list/array of label names like, [‘Name’, ‘Age’, ‘City’]
    - In this case, multiple columns will be selected based on column labels given in the list.
    - For example, if [‘Name’, ‘Age’, ‘City’] is given as argument in column segment, then the columns with label names ‘Name’, ‘Age’, and ‘City’ will be selected.
  - A slice object with ints like -> a:e .
    - This case will select multiple columns i.e. from column with label a to one before the column with label e.
    - For example, if ‘Name’:’City’ is provided in the column segment of loc[], it will select a range of columns from label ‘Name’ to one before label ‘City’
    - For selecting all columns, provide the value ( : )
  - A boolean sequence of same size as number of columns.
    - In this case, it will select only those columns for which the corresponding value in boolean array/list is True.
  - A callable function :
    - It can be a lambda function or general function that accepts the calling dataframe as an argument and returns valid label names in any one of the formats mentioned above.

Returns :

It returns a reference to the selected subset of the dataframe based on the provided row and column names.
Also, if column_segment is not provided, it returns the subset of the Dataframe containing only selected rows based on the row_segment argument.

Frequently Asked:

Error scenarios:

Dataframe.loc[row_sgement, column_segement] will give KeyError, if any label name provided is invalid.

Let’s understand more about it with some examples,

Pandas Dataframe.loc[] – Examples

We have divided examples in three parts i.e.

Select a few rows from Dataframe, but include all column values

Select a few columns from Dataframe, but include all row values for those columns.

Select a subset of Dataframe with few rows and columns
Change values of Dataframe by loc[]

Let’s look at these examples one by one. But before that we will create a Dataframe from list of tuples,

import pandas as pd

# List of Tuples
students = [('jack',  34, 'Sydeny',    'Australia'),
            ('Riti',  30, 'Delhi',     'India'),
            ('Vikas', 31, 'Mumbai',    'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John',  16, 'New York',   'US'),
            ('Mike',  17, 'las vegas',  'US')]

# Create a DataFrame from list of tuples
df = pd.DataFrame( students,
                   columns=['Name', 'Age', 'City', 'Country'],
                   index=['a', 'b', 'c', 'd', 'e', 'f'])

print(df)

Output:

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Select a few rows from Dataframe

Here we will provide only row segment argument to the Dataframe.loc[]. Therefore it will select rows based on given names and all columns.

Select a single row of Dataframe

To select a row from the dataframe, pass the row name to the loc[]. For example,

# Select row at with label name 'c'
row = df.loc['c']

print(row)

Output:

Name        Vikas
Age            31
City       Mumbai
Country     India
Name: c, dtype: object

It returned the row with label name ‘c’ from the Dataframe, as a Series object.

Select multiple rows from Dataframe based on list of names

Pass a list of row label names to the row_segment of loc[]. It will return a subset of the Dataframe containing only mentioned rows. For example,

# Select multiple rows from Dataframe by label names
subsetDf = df.loc[ ['c', 'f', 'a'] ]

print(subsetDf)

Output:

    Name  Age       City    Country
c  Vikas   31     Mumbai      India
f   Mike   17  las vegas         US
a   jack   34     Sydeny  Australia

It returned a subset of the Dataframe containing only three rows with labels ‘c’, ‘f’ and ‘a’.

Select multiple rows from Dataframe based on name range

Pass an name range -> start:end in row segment of loc. It will return a subset of the Dataframe containing only the rows from name start to end from the original dataframe. For example,

# Select rows of Dataframe based on row label range
subsetDf = df.loc[ 'b' : 'f' ]

print(subsetDf)

Output:

    Name  Age       City Country
b   Riti   30      Delhi   India
c  Vikas   31     Mumbai   India
d  Neelu   32  Bangalore   India
e   John   16   New York      US
f   Mike   17  las vegas      US

It returned a subset of the Dataframe containing only five rows from the original dataframe i.e. rows from label ‘b’ to label ‘f’.

Select rows of Dataframe based on bool array

Pass a boolean array/list in the row segment of loc[]. It will return a subset of the Dataframe containing only the rows for which the corresponding value in the boolean array/list is True. For example,

# Select rows of Dataframe based on bool array
subsetDf = df.loc[ [True, False, True, False, True, False] ]

print(subsetDf)

Output:

    Name  Age      City    Country
a   jack   34    Sydeny  Australia
c  Vikas   31    Mumbai      India
e   John   16  New York         US

Select rows of Dataframe based on Callable function

Create a lambda function that accepts a dataframe as an argument, applies a condition on a column, and returns a bool list. This bool list will contain True only for those rows where the condition is True. Pass that lambda function to loc[] and returns only those rows will be selected for which condition returns True in the list.

For example, select only those rows where column ‘Age’ has a value of more than 25,

# Select rows of Dataframe based on callable function
subsetDf = df.loc[ lambda x : (x['Age'] > 25).tolist() ]

print(subsetDf)

Output:

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India

Select a few Columns from Dataframe

Here we will provide the (:) in the row segment argument of the Dataframe.loc[]. Therefore it will select all rows, but only a few columns based on the names provided in column_segement.

Select a single column of Dataframe

To select a column from the dataframe, pass the column name to the loc[]. For example,

# Select single column from Dataframe by column name
column = df.loc[:, 'Age']

print(column)

Output:

a    34
b    30
c    31
d    32
e    16
f    17
Name: Age, dtype: int64

It returned the column ‘Age’ from Dataframe, as a Series object.

Select multiple columns from Dataframe based on list of names

Pass a list of column names to the column_segment of loc[]. It will return a subset of the Dataframe containing only mentioned columns. For example,

# Select multiple columns from Dataframe based on list of names
subsetDf = df.loc[:, ['Age', 'City', 'Name']]

print(subsetDf)

Output:

   Age       City   Name
a   34     Sydeny   jack
b   30      Delhi   Riti
c   31     Mumbai  Vikas
d   32  Bangalore  Neelu
e   16   New York   John
f   17  las vegas   Mike

It returned a subset of the Dataframe containing only three columns.

Select multiple columns from Dataframe based on name range

Pass an name range -> start:end in column segment of loc. It will return a subset of the Dataframe containing only the columns from name start to end, from the original dataframe. For example,

# Select multiple columns from Dataframe by name range
subsetDf = df.loc[:, 'Name' : 'City']

print(subsetDf)

Output:

    Name  Age       City
a   jack   34     Sydeny
b   Riti   30      Delhi
c  Vikas   31     Mumbai
d  Neelu   32  Bangalore
e   John   16   New York
f   Mike   17  las vegas

It returned a subset of the Dataframe containing only three columns, i.e., ‘Name’ to ‘City’.

Select columns of Dataframe based on bool array

Pass a boolean array/list in the column segment of loc[]. It will return a subset of the Dataframe containing only the columns for which the corresponding value in the boolean array/list is True. For example,

# Select columns of Dataframe based on bool array
subsetDf = df.iloc[:, [True, True, False, False]]

print(subsetDf)

Output:

    Name  Age
a   jack   34
b   Riti   30
c  Vikas   31
d  Neelu   32
e   John   16
f   Mike   17

Select a subset of Dataframe

Here we will provide the row and column segment arguments of the Dataframe.loc[]. It will return a subset of Dataframe based on the row and column names provided in row and column segments of loc[].

Select a Cell value from Dataframe

To select a single cell value from the dataframe, just pass the row and column name in the row and column segment of loc[]. For example,

# Select a Cell value from Dataframe by row and column name
cellValue = df.loc['c','Name']

print(cellValue)

Output:

Vikas

It returned the cell value at (‘c’,’Name’).

Select subset of Dataframe based on row/column names in list

Select a subset of the dataframe. This subset should include the following rows and columns,

Rows with names ‘b’, ‘d’ and ‘f’
Columns with name ‘Name’ and ‘City’

# Select sub set of Dataframe based on row/column indices in list
subsetDf = df.loc[['b', 'd', 'f'],['Name', 'City']]

print(subsetDf)

Output:

    Name       City
b   Riti      Delhi
d  Neelu  Bangalore
f   Mike  las vegas

It returned a subset from the calling dataframe object.

Select subset of Dataframe based on row/column name range

Select a subset of the dataframe. This subset should include the following rows and columns,

Rows from name ‘b’ to ‘e’
Columns from name ‘Name’ to ‘City’

# Select subset of Dataframe based on row and column label name range.
subsetDf = df.loc['b':'e', 'Name':'City']

print(subsetDf)

Output:

    Name  Age       City
b   Riti   30      Delhi
c  Vikas   31     Mumbai
d  Neelu   32  Bangalore
e   John   16   New York

It returned a subset from the calling dataframe object.

Pro Tip: Changing the values of Dataframe using loc[]

loc[] returns a view object, so any changes made in the returned subset will be reflected in the original Dataframe object. For example, let’s select the row with label ‘c’ from the dataframe using loc[] and change its content,

print(df)

# Change the contents of row 'C' to 0
df.loc['c'] = 0

print(df)

Output:

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US


    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c      0    0          0          0
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Changes made to view object returned by loc[], will also change the content of the original dataframe.

Summary:

We learned about how to use the Dataframe.loc[] with several examples and discussed how to access rows, columns or a subset of DataFrame by label names.

DataFrame.loc[]

Frequently Asked:

Pandas Dataframe.loc[] – Examples

Select a few rows from Dataframe

Select a single row of Dataframe

Select multiple rows from Dataframe based on list of names

Select multiple rows from Dataframe based on name range

Select rows of Dataframe based on bool array

Select rows of Dataframe based on Callable function

Select a few Columns from Dataframe

Select a single column of Dataframe

Select multiple columns from Dataframe based on list of names

Select multiple columns from Dataframe based on name range

Select columns of Dataframe based on bool array

Select a subset of Dataframe

Select a Cell value from Dataframe

Select subset of Dataframe based on row/column names in list

Select subset of Dataframe based on row/column name range

Pro Tip: Changing the values of Dataframe using loc[]

Related posts:

Share your love