Pandas Tutorial Part #8 – DataFrame.iloc[]

In this tutorial, we will discuss how to use the iloc property of the Dataframe and select rows, columns, or a subset of DataFrame based on the index positions or range of index positions. Then we will also discuss the way to change the selected values.

DataFrame.iloc[]

In Pandas, the Dataframe provides a property iloc[], to select the subset of Dataframe based on position indexing. This subset’s spread will be decided based on the provided index positions of rows & columns. We can select single or multiple rows & columns using it. Let’s learn more about it,

Syntax:

Dataframe.iloc[row_segment , column_segment]
Dataframe.iloc[row_segment]

The column_segment argument is optional. Therefore, if column_segment is not provided, iloc [] will select the subset of Dataframe based on row_segment argument only.

Arguments:

Advertisements
  • row_segement:
    • It contains information about the index positions of rows to be selected. Its value can be,
      • An integer like N.
        • In this case, it selects the single row at index position N.
        • For example, if 2 only is given, then only the 3rd row of the Dataframe will be selected because indexing starts from 0.
      • A list/array of integers like [a, b, c].
        • In this case, multiple rows will be selected based on index positions in the given list.
        • For example, if [2, 4, 0] is given as argument in row segment, then 3rd, 5th and 1st row of the Dataframe will be selected.
      • A slice object with ints like -> a:e .
        • This case will select multiple rows from index position a to e-1.
        • For example, if 2:5 is provided in the row segment of iloc[], it will select a range of rows from index positions 2 to 4.
        • For selecting all rows, provide the value ( : )
      • A boolean sequence of same size as number of rows.
        • In this case, it will select only those rows for which the corresponding value in boolean array/list is True.
      • A callable function :
        • It can be a lambda function or general function, which accepts the calling dataframe as an argument and returns valid output for indexing. This returned output should match with any of the indexing arguments mentioned above.
  • column_segement:
    • It is optional.
    • It contains the information about the index positions of columns to be selected. Its value can be,
      • An integer like N.
        • In this case a single column at index position N will be selected.
        • For example, if 3 is given, only the 4th column of the Dataframe will be selected because indexing starts from 0.
      • A list/array of integers like [a, b, c].
        • In this case, multiple columns will be selected i.e. columns at index positions given in list.
        • For example, if [2, 4, 0] is given as argument in column segment, then 3rd, 5th and 1st column of the Dataframe will be selected.
      • A slice object with ints like a:e.
        • In this case it will select multiple columns index position a to e-1.
        • For example, if 2:5 is given in the column segment of iloc[], it will select a range of columns from index positions 2 to 4.
        • For selecting all columns, provide the value ( : )
      • A boolean sequence of the same size as the number of columns.
        • This case will select only those columns for which the corresponding value in the boolean array/list is True.
      • A callable function :
        • It can be a lambda function or general function, which accepts the calling dataframe as an argument and returns valid output for indexing. This returned output should match with any of the indexing arguments mentioned above.

Returns :

It returns a reference to the selected subset of the dataframe based on index positions specified in row and column segments.
Also, if column_segment is not provided, it returns the subset of the Dataframe containing only selected rows based on the row_segment argument.

Error scenarios:

Dataframe.iloc[row_sgement, column_segement] will give IndexError, if any request index position is out of bounds.

Let’s understand more about it with some examples,

Pandas Dataframe.iloc[] – Examples

We have divided examples in three parts i.e.

Let’s look at these examples one by one. First we will create a Dataframe from list of tuples,

import pandas as pd

# List of Tuples
students = [('jack',  34, 'Sydeny',    'Australia'),
            ('Riti',  30, 'Delhi',     'India'),
            ('Vikas', 31, 'Mumbai',    'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John',  16, 'New York',   'US'),
            ('Mike',  17, 'las vegas',  'US')]

# Create a DataFrame from list of tuples
df = pd.DataFrame( students,
                   columns=['Name', 'Age', 'City', 'Country'],
                   index=['a', 'b', 'c', 'd', 'e', 'f'])

print(df)

Output

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Select few rows from Dataframe

Here we will provide only row segment argument to the Dataframe.iloc[]. Therefore it will select rows based on given indices and all columns.

Select a single row of Dataframe

To select a row from the dataframe, pass the row index position to the iloc[]. For example,

# Select row at index position 2 i.e. the 3rd row of Dataframe
row = df.iloc[2]

print(row)

Output:

Name        Vikas
Age            31
City       Mumbai
Country     India
Name: c, dtype: object

It returned the 3rd row of the Dataframe as a Series object. As indexing starts from 0, therefore row at index position 2 is the 3rd row of the Dataframe.

Select multiple rows from Dataframe based on a list of indices

Pass a list of row index positions to the row_segment of iloc[]. It will return a subset of the Dataframe containing only the rows mentioned at given indexes. For example,

# Select rows of Dataframe based on row indices in list
subsetDf = df.iloc[ [2,4,1] ]

print(subsetDf)

Output:

    Name  Age      City Country
c  Vikas   31    Mumbai   India
e   John   16  New York      US
b   Riti   30     Delhi   India

It returned a subset of the Dataframe containing only three rows from the original dataframe i.e. rows at index positions 2, 4, and 1.

Select multiple rows from Dataframe based on index range

Pass an index range -> start:end-1 in row segment of iloc. It will return a subset of the Dataframe containing only the rows from index position start to end-1 from the original dataframe. For example,

# Select rows of Dataframe based on row index range
subsetDf = df.iloc[ 1:4 ]

print(subsetDf)

Output:

    Name  Age       City Country
b   Riti   30      Delhi   India
c  Vikas   31     Mumbai   India
d  Neelu   32  Bangalore   India

It returned a subset of the Dataframe containing only three rows from the original dataframe i.e. rows at index positions 1 to 3.

Select rows of Dataframe based on bool array

Pass a boolean array/list in the row segment of iloc[]. It will return a subset of the Dataframe containing only the rows for which the corresponding value in the boolean array/list is True. For example,

# Select rows of Dataframe based on bool array
subsetDf = df.iloc[ [True, False, True, False, True, False] ]

print(subsetDf)

Output:

    Name  Age      City    Country
a   jack   34    Sydeny  Australia
c  Vikas   31    Mumbai      India
e   John   16  New York         US

Select rows of Dataframe based on Callable function

Create a lambda function that accepts a dataframe as an argument, applies a condition on a column, and returns a bool list. This bool list will contain True only for those rows where the condition is True. Pass this lambda function to iloc[] and returns only those rows will be selected for which condition returns True in the list.

For example, select only those rows where column ‘Age’ has a value of more than 25,

# Select rows of Dataframe based on callable function
subsetDf = df.iloc[ lambda x : (x['Age'] > 25).tolist() ]

print(subsetDf)

Output:

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India

Select a few Columns from Dataframe

Here we will provide the (:) in the row segment argument of the Dataframe.iloc[]. Therefore it will select all rows, but only a few columns based on the indices provided in column_segement.

Select a single column of Dataframe

To select a column from the dataframe, pass the column index number to the iloc[]. For example,

# Select single column by index position
column = df.iloc[:, 2]

print(column)

Output:

a       Sydeny
b        Delhi
c       Mumbai
d    Bangalore
e     New York
f    las vegas
Name: City, dtype: object

It returned the 3rd column of the Dataframe as a Series object. As indexing starts from 0, therefore column at index number 2 is the 3rd column of the Dataframe.

Select multiple columns from Dataframe based on a list of indices

Pass a list of column index numbers to the column_segment of iloc[]. It will return a subset of the Dataframe containing only the columns mentioned at given indexes. For example,

# Select multiple columns by indices
subsetDf = df.iloc[:, [2, 3, 1]]
print(subsetDf)

Output:

        City    Country  Age
a     Sydeny  Australia   34
b      Delhi      India   30
c     Mumbai      India   31
d  Bangalore      India   32
e   New York         US   16
f  las vegas         US   17

It returned a subset of the Dataframe containing only three columns from the original dataframe i.e. columns at index numbers 2, 3, and 1.

Select multiple columns from Dataframe based on index range

Pass an index range -> start:end-1 in column segment of iloc. It will return a subset of the Dataframe containing only the columns from index number start to end-1 from the original dataframe. For example,

# Select multiple columns by index range
subsetDf = df.iloc[:, 1 : 4]

print(subsetDf)

Output:

   Age       City    Country
a   34     Sydeny  Australia
b   30      Delhi      India
c   31     Mumbai      India
d   32  Bangalore      India
e   16   New York         US
f   17  las vegas         US

It returned a subset of the Dataframe containing only three columns from the original dataframe i.e. columns at index numbers 1 to 3.

Select columns of Dataframe based on bool array

Pass a boolean array/list in the column segment of iloc[]. It will return a subset of the Dataframe containing only the columns for which the corresponding value in the boolean array/list is True. For example,

# Select columns of Dataframe based on bool array
subsetDf = df.iloc[ : , [True, True, False, False] ]

print(subsetDf)

Output:

    Name  Age
a   jack   34
b   Riti   30
c  Vikas   31
d  Neelu   32
e   John   16
f   Mike   17

Select a subset of Dataframe

Here we will provide the row and column segment arguments of the Dataframe.iloc[]. It will return a subset of Dataframe based on the row and column indices provided in row and column segments of iloc[].

Select a Cell value from Dataframe

To select a single cell value from the dataframe, just pass the row and column number in the row and column segment of iloc[]. For example,

# Select a Cell value from Dataframe
cellValue = df.iloc[3,2]

print(cellValue)

Output:

Bangalore

It returned the cell value at position (3,2) i.e. in the 4th row and 3rd column, because indexing starts from 0.

Select subset of Dataframe based on row/column indices in list

Select a subset of the dataframe. This subset should include the following rows and columns,

  • Rows at index positions 1 and 3.
  • Columns at index positions 2 and 1.
# Select sub set of Dataframe based on row/column indices in list
subsetDf = df.iloc[[1,3],[2,1]]

print(subsetDf)

Output:

        City  Age
b      Delhi   30
d  Bangalore   32

It returned a subset from the calling dataframe object.

Select subset of Dataframe based on row/column index range

Select a subset of the dataframe. This subset should include the following rows and columns,

  • Rows from index position 1 to 4
  • Columns from index position 1 to 3
# Select subset of Dataframe based on row and column index range.
subsetDf = df.iloc[1:4, 1:4]

print(subsetDf)

Output:

   Age       City Country
b   30      Delhi   India
c   31     Mumbai   India
d   32  Bangalore   India

It returned a subset from the calling dataframe object.

Pro Tip: Changing the values of Dataframe using iloc[]

iloc[] returns a view object, so any changes made in the returned subset will be reflected in the original Dataframe object. For example, let’s select the 3rd row of the dataframe using iloc[] and change its content,

print(df)

# change the value of 3rd row of Dataframe
df.iloc[2] = 0

print(df)

Output:

    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US


    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c      0    0          0          0
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Changes made to the view object returned by iloc[], will also change the content of the original dataframe.

Summary:

We learned about how to use the Dataframe.iloc[] with several examples,

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top