Pandas Tutorial #11 – DataFrame attributes & methods

This tutorial will discuss some of the most used attributes and methods of the DataFrame in Pandas.

Table of Contents

First, we will create a DataFrame using a list of tuples,

import pandas as pd

# List of Tuples
empoyees = [(11, 'jack', 34, 'Sydney', 5) ,
            (12, 'Riti', 31, 'Delhi' , 7) ,
            (13, 'Aadi', 16, 'New York', 11) ,
            (14, 'Mohit', 32,'Delhi' , 15) ,
            (15, 'Veena', 33, 'Delhi' , 4) ,
            (16, 'Shaunak', 35, 'Mumbai', 5 ),
            (17, 'Shaun', 35, 'Colombo', 11)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees,
                    columns=['ID', 'Name', 'Age', 'City', 'Experience'],
                    index=['a', 'b', 'c', 'd', 'e', 'f', 'h'])

# Display the DataFrame
print(df)

Output:

   ID     Name  Age      City  Experience
a  11     jack   34    Sydney           5
b  12     Riti   31     Delhi           7
c  13     Aadi   16  New York          11
d  14    Mohit   32     Delhi          15
e  15    Veena   33     Delhi           4
f  16  Shaunak   35    Mumbai           5
h  17    Shaun   35   Colombo          11

This DataFrame contains seven rows and five columns. Now let’s look at some of the basic operations that we can perform on this DataFrame

Get the Row Index Labels of a DataFrame

In Pandas, the DataFrame provides an attribute index, and it gives an Index object containing all the row index labels of the DataFrame. For example,

# Get row index labels of DataFrame
# as an Index object
rowIndex = df.index

print(rowIndex)

Output:

Index(['a', 'b', 'c', 'd', 'e', 'f', 'h'], dtype='object')

It gave an Index class object populated with row labels. We can also select a single label from this by the index position i.e.

# Select the label name
# of the 2nd row of DataFrame
print(df.index[1])

Output:

b

As index positions start from 0, to select the 2nd-row label name, we passed the value 1 in the subscript operator of the Index object.

Get the Column Names of a DataFrame

In Pandas, the DataFrame provides attribute columns, and it gives an Index object containing all the column names of the DataFrame. For example,

# Get column names of DataFrame
# as an Index object
columnNames = df.columns

print(columnNames)

Output:

Index(['ID', 'Name', 'Age', 'City', 'Experience'], dtype='object')

It gave an Index class object populated with the column names. We can also select a column name from this by the column’s index position i.e.

# Select the 2nd column name
# from the Column Index
print(df.columns[1])

Output:

Name

As index positions start from 0, so to select the 2nd column name by index position, we passed the value 1 in the subscript operator of the Index object.

Get the Data Types of each column in DataFrame

In Pandas, the DataFrame provides an attribute dtype, and it returns a Series with the data type of each column. For example,

# Get the Data Types of all columns
dataTypes = df.dtypes

print(dataTypes)

Output:

ID             int64
Name          object
Age            int64
City          object
Experience     int64
dtype: object

It returned a Series object, where the index contains the column names of DataFrame and the corresponding value contains that column’s data type information in the DataFrame. String values are stored as an object data type in the DataFrame.

Get all values of DataFrame as NumPy Array

In Pandas, the DataFrame provides attribute values, and it returns a Numpy representation of the DataFrame. The values will not contain the row index labels or column names. For example,

# Get DataFrame values
# as 2D NumPy Array
arr = df.values

print(arr)

Output:

[[11 'jack' 34 'Sydney' 5]
 [12 'Riti' 31 'Delhi' 7]
 [13 'Aadi' 16 'New York' 11]
 [14 'Mohit' 32 'Delhi' 15]
 [15 'Veena' 33 'Delhi' 4]
 [16 'Shaunak' 35 'Mumbai' 5]
 [17 'Shaun' 35 'Colombo' 11]]

It returned a 2D NumPy array containing all the values of the DataFrame.

Get the Shape of DataFrame

In Pandas, the DataFrame provides an attribute shape, and it returns a tuple representing the dimensions of the DataFrame. For example,

# Get the shape of DataFrame
shape = df.shape

print(shape)

Output:

(7, 5)

It returned a tuple containing two numbers. The first value denotes the number of rows in the DataFrame, and 2nd value represents the number of columns of the DataFrame.

We can use this to,

Get the total number of rows in the DataFrame

# Get the total number of rows
rowCount = df.shape[0]

print(rowCount)

Output:

7

The first value of the tuple returned by the shape attribute gives us the total rows in the DataFrame.

Get the total number of columns in the DataFrame

# Get the total number of columns
columnCount = df.shape[1]

print(columnCount)

Output:

5

The second value of the tuple returned by the shape attribute gives us the total number of columns in the DataFrame.

Get count of total values in DataFrame

In Pandas, the DataFrame provides an attribute size, and it returns the total number of elements in the DataFrame. For example,

# Get total number of elements in DataFrame
totalCount = df.size

print(totalCount)

Output:

35

Get the first N rows of the DataFrame

In Pandas, the DataFrame provides a method head(N). It accepts an argument N and returns the first N rows of the DataFrame.

# Get first 3 rows of the DataFrame
subDf = df.head(3)

print(subDf)

Output:

   ID  Name  Age      City  Experience
a  11  jack   34    Sydney           5
b  12  Riti   31     Delhi           7
c  13  Aadi   16  New York          11

Here, it returned the first three rows of the DataFrame. If N is not provided, it returns the first five rows of the DataFrame.

Get the last N rows of the DataFrame

In Pandas, the DataFrame provides a method tail(N). It accepts an argument N and returns the last N rows of the DataFrame.

# Get last 3 rows of the DataFrame
subDf = df.tail(3)

print(subDf)

Output:

   ID     Name  Age     City  Experience
e  15    Veena   33    Delhi           4
f  16  Shaunak   35   Mumbai           5
h  17    Shaun   35  Colombo          11

Here, it returned the last three rows of the DataFrame. If N is not provided, it returns the last five rows of the DataFrame.

Transpose a DataFrame

In Pandas, the DataFrame provides an attribute T, and it returns the transposed version of the DataFrame. In transposed DataFrame, rows become the columns, and columns become the rows. For example, contents of the original DataFrame df is,

   ID     Name  Age      City  Experience
a  11     jack   34    Sydney           5
b  12     Riti   31     Delhi           7
c  13     Aadi   16  New York          11
d  14    Mohit   32     Delhi          15
e  15    Veena   33     Delhi           4
f  16  Shaunak   35    Mumbai           5
h  17    Shaun   35   Colombo          11

Let’s get a transposed version of this DataFrame,

# Get transpose of DataFrame
transposedDf = df.T

print(transposedDf)

Output:

                 a      b         c      d      e        f        h
ID              11     12        13     14     15       16       17
Name          jack   Riti      Aadi  Mohit  Veena  Shaunak    Shaun
Age             34     31        16     32     33       35       35
City        Sydney  Delhi  New York  Delhi  Delhi   Mumbai  Colombo
Experience       5      7        11     15      4        5       11

Summary:

We learned about some of the primary methods and attributes of the DataFrame in Pandas.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top