This tutorial will discuss some of the most used attributes and methods of the DataFrame in Pandas.
Table of Contents
- Get the Row Index Labels of a DataFrame
- Get the Column Names of a DataFrame
- Get the Data Types of each column in DataFrame
- Get all values of DataFrame as NumPy Array
- Get the Shape of DataFrame
- Get count of total values in DataFrame
- Get the first N rows of the DataFrame
- Get the last N rows of the DataFrame
- Transpose a DataFrame
First, we will create a DataFrame using a list of tuples,
import pandas as pd # List of Tuples empoyees = [(11, 'jack', 34, 'Sydney', 5) , (12, 'Riti', 31, 'Delhi' , 7) , (13, 'Aadi', 16, 'New York', 11) , (14, 'Mohit', 32,'Delhi' , 15) , (15, 'Veena', 33, 'Delhi' , 4) , (16, 'Shaunak', 35, 'Mumbai', 5 ), (17, 'Shaun', 35, 'Colombo', 11)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['ID', 'Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'h']) # Display the DataFrame print(df)
Output:
ID Name Age City Experience a 11 jack 34 Sydney 5 b 12 Riti 31 Delhi 7 c 13 Aadi 16 New York 11 d 14 Mohit 32 Delhi 15 e 15 Veena 33 Delhi 4 f 16 Shaunak 35 Mumbai 5 h 17 Shaun 35 Colombo 11
This DataFrame contains seven rows and five columns. Now let’s look at some of the basic operations that we can perform on this DataFrame
Get the Row Index Labels of a DataFrame
In Pandas, the DataFrame provides an attribute index, and it gives an Index object containing all the row index labels of the DataFrame. For example,
# Get row index labels of DataFrame # as an Index object rowIndex = df.index print(rowIndex)
Output:
Frequently Asked:
Index(['a', 'b', 'c', 'd', 'e', 'f', 'h'], dtype='object')
It gave an Index class object populated with row labels. We can also select a single label from this by the index position i.e.
# Select the label name # of the 2nd row of DataFrame print(df.index[1])
Output:
b
As index positions start from 0, to select the 2nd-row label name, we passed the value 1 in the subscript operator of the Index object.
Get the Column Names of a DataFrame
In Pandas, the DataFrame provides attribute columns, and it gives an Index object containing all the column names of the DataFrame. For example,
# Get column names of DataFrame # as an Index object columnNames = df.columns print(columnNames)
Output:
Index(['ID', 'Name', 'Age', 'City', 'Experience'], dtype='object')
It gave an Index class object populated with the column names. We can also select a column name from this by the column’s index position i.e.
# Select the 2nd column name # from the Column Index print(df.columns[1])
Output:
Name
As index positions start from 0, so to select the 2nd column name by index position, we passed the value 1 in the subscript operator of the Index object.
Get the Data Types of each column in DataFrame
In Pandas, the DataFrame provides an attribute dtype, and it returns a Series with the data type of each column. For example,
# Get the Data Types of all columns dataTypes = df.dtypes print(dataTypes)
Output:
ID int64 Name object Age int64 City object Experience int64 dtype: object
It returned a Series object, where the index contains the column names of DataFrame and the corresponding value contains that column’s data type information in the DataFrame. String values are stored as an object data type in the DataFrame.
Get all values of DataFrame as NumPy Array
In Pandas, the DataFrame provides attribute values, and it returns a Numpy representation of the DataFrame. The values will not contain the row index labels or column names. For example,
# Get DataFrame values # as 2D NumPy Array arr = df.values print(arr)
Output:
[[11 'jack' 34 'Sydney' 5] [12 'Riti' 31 'Delhi' 7] [13 'Aadi' 16 'New York' 11] [14 'Mohit' 32 'Delhi' 15] [15 'Veena' 33 'Delhi' 4] [16 'Shaunak' 35 'Mumbai' 5] [17 'Shaun' 35 'Colombo' 11]]
It returned a 2D NumPy array containing all the values of the DataFrame.
Get the Shape of DataFrame
In Pandas, the DataFrame provides an attribute shape, and it returns a tuple representing the dimensions of the DataFrame. For example,
# Get the shape of DataFrame shape = df.shape print(shape)
Output:
(7, 5)
It returned a tuple containing two numbers. The first value denotes the number of rows in the DataFrame, and 2nd value represents the number of columns of the DataFrame.
We can use this to,
Get the total number of rows in the DataFrame
# Get the total number of rows rowCount = df.shape[0] print(rowCount)
Output:
7
The first value of the tuple returned by the shape attribute gives us the total rows in the DataFrame.
Get the total number of columns in the DataFrame
# Get the total number of columns columnCount = df.shape[1] print(columnCount)
Output:
5
The second value of the tuple returned by the shape attribute gives us the total number of columns in the DataFrame.
Get count of total values in DataFrame
In Pandas, the DataFrame provides an attribute size, and it returns the total number of elements in the DataFrame. For example,
# Get total number of elements in DataFrame totalCount = df.size print(totalCount)
Output:
35
Get the first N rows of the DataFrame
In Pandas, the DataFrame provides a method head(N). It accepts an argument N and returns the first N rows of the DataFrame.
# Get first 3 rows of the DataFrame subDf = df.head(3) print(subDf)
Output:
ID Name Age City Experience a 11 jack 34 Sydney 5 b 12 Riti 31 Delhi 7 c 13 Aadi 16 New York 11
Here, it returned the first three rows of the DataFrame. If N is not provided, it returns the first five rows of the DataFrame.
Get the last N rows of the DataFrame
In Pandas, the DataFrame provides a method tail(N). It accepts an argument N and returns the last N rows of the DataFrame.
# Get last 3 rows of the DataFrame subDf = df.tail(3) print(subDf)
Output:
ID Name Age City Experience e 15 Veena 33 Delhi 4 f 16 Shaunak 35 Mumbai 5 h 17 Shaun 35 Colombo 11
Here, it returned the last three rows of the DataFrame. If N is not provided, it returns the last five rows of the DataFrame.
Transpose a DataFrame
In Pandas, the DataFrame provides an attribute T, and it returns the transposed version of the DataFrame. In transposed DataFrame, rows become the columns, and columns become the rows. For example, contents of the original DataFrame df is,
ID Name Age City Experience a 11 jack 34 Sydney 5 b 12 Riti 31 Delhi 7 c 13 Aadi 16 New York 11 d 14 Mohit 32 Delhi 15 e 15 Veena 33 Delhi 4 f 16 Shaunak 35 Mumbai 5 h 17 Shaun 35 Colombo 11
Let’s get a transposed version of this DataFrame,
# Get transpose of DataFrame transposedDf = df.T print(transposedDf)
Output:
a b c d e f h ID 11 12 13 14 15 16 17 Name jack Riti Aadi Mohit Veena Shaunak Shaun Age 34 31 16 32 33 35 35 City Sydney Delhi New York Delhi Delhi Mumbai Colombo Experience 5 7 11 15 4 5 11
Summary:
We learned about some of the primary methods and attributes of the DataFrame in Pandas.