Pandas Tutorial #6 – Introduction to DataFrame

In this tutorial, we will discuss what is a Pandas DataFrame and how to create a DataFrame from a csv file or other Python data structures like list or dictionary.

Table Of Contents

What is a DataFrame in Pandas?

In Python, the Pandas module provides a data structure that stores the data in tabular format. It can be n dimensional data structure, but in most of the cases it is used as two dimensional and stores the data in rows and columns. Imagine it like an Excel Worksheet, where is data is organized in rows and columns. A Dataframe looks like this,

Pandas DataFrame – Structure

Each row as has an index label associated with it and each column has a column name associated with it. We can select and process individual rows, columns or cells in DataFrame.

How to create a Pandas DataFrame?

There are different ways to create a DataFrame using other data structures in Python or we can also create DataFrame by loading the contents from csv or excel files. Let’s see different ways to create a DataFrame,

Create DataFrame from a CSV file

Suppose we have a CSV file employees.csv, and it is in the same folder as our Python file. Contents of the employees.csv is as follows,

Name,Age,City,Experience
John,29,London,15
Mark,24,New York,13
Joseph,28,Tokyo,14
Ritika,31,Delhi,11
Vinod,33,Mumbai,13
Saurav,31,Sydney,13
Lucy,32,Paris,13

It has employees’ data like their name, age, city, and experience. Now we want to create a Pandas Dataframe object using this CSV file. For that, first we will import the pandas module as pd i.e.

import pandas as pd

pd is an alias to the pandas.

Pandas module provides a function read_csv(), it takes the csv file path or name as argument and imports the content of a csv file into a Dataframe object. We are going to use this to create Dataframe. For example,

import pandas as pd

# Load the csv file and create a DataFrame object
df = pd.read_csv('employees.csv')

# Display the DataFrame
print(df)

Output:

     Name  Age      City  Experience
0    John   29    London          15
1    Mark   24  New York          13
2  Joseph   28     Tokyo          14
3  Ritika   31     Delhi          11
4   Vinod   33    Mumbai          13
5  Saurav   31    Sydney          13
6    Lucy   32     Paris          13

We called the read_csv() function and passed the CSV file name as an argument in it. The read_csv() function loads the CSV file and returns a dataframe object populated with that content. Then we printed the contents of the DataFrame.

A Dataframe stores the content in a tabular format, which means that our data is organized in rows and columns. As we have created the Dataframe object from the csv file, therefore the first row of our csv file was used as column labels. Dataframe provides various functions to select the content from this dataframe. We can select a single row or column from the DataFrame or a sub-set of this dataframe and perform various operations on it. We will discuss that later in this series.

There are other ways as well to create a Dataframe object. Like we can create a DataFrame from a dictionary of lists too.

Create DataFrame from dictionary and lists

Pandas module provides a function Dataframe(). In one of its overloaded implementation, it accepts a dictionary of lists as an argument. Each key-value pair of this dictionary contains the contents of a column. It means that the key acts as the column label, and the value is a list object, which includes the values of that particular column. It returns a dataframe object populated with all the provided values.

Let’s see some practical examples,

First of all, import the pandas module as pd and create a dictionary that contains the column names and their values. The dictionary should contain the information about employees. Then use this dictionary to create a Dataframe object i.e.

import pandas as pd

# Create a dictionary of lists
employees = { 'Name': ['John', 'Mark', 'Joseph', 'Ritika', 'Vinod', 'Saurav', 'Lucy'],
              'Age': [29, 24, 28, 31, 33, 32, 31],
              'City': ['London', 'Tokyo', 'Delhi', 'Mumbai', 'Sydney', 'Paris', 'New York'],
              'Experience': [15, 13, 14, 11, 13, 12, 15]}

# Create a Pandas DataFrame from a list of Dictionaries
df = pd.DataFrame(employees)

# Display the DataFrame
print(df)

Output

     Name  Age      City  Experience
0    John   29    London          15
1    Mark   24     Tokyo          13
2  Joseph   28     Delhi          14
3  Ritika   31    Mumbai          11
4   Vinod   33    Sydney          13
5  Saurav   32     Paris          12
6    Lucy   31  New York          15

We passed the dictionary to Dataframe() function, and it returned a Dataframe object filled with provided values.

Summary

We learned about the basic of DataFrame and how to create a Pandas DataFrame.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top