Pandas Tutorial Part #6 – Introduction to DataFrame

In this tutorial, we will discuss what is a Pandas DataFrame and how to create a DataFrame from a csv file or other Python data structures like list or dictionary.

Table Of Contents

What is a DataFrame in Pandas?

In Python, the Pandas module provides a data structure that stores the data in tabular format. It can be n dimensional data structure, but in most of the cases it is used as two dimensional and stores the data in rows and columns. Imagine it like an Excel Worksheet, where is data is organized in rows and columns. A Dataframe looks like this,

Pandas DataFrame – Structure

Each row as has an index label associated with it and each column has a column name associated with it. We can select and process individual rows, columns or cells in DataFrame.

How to create a Pandas DataFrame?

There are different ways to create a DataFrame using other data structures in Python or we can also create DataFrame by loading the contents from csv or excel files. Let’s see different ways to create a DataFrame,

Create DataFrame from a CSV file

Suppose we have a CSV file employees.csv, and it is in the same folder as our Python file. Contents of the employees.csv is as follows,

Advertisements
Name,Age,City,Experience
John,29,London,15
Mark,24,New York,13
Joseph,28,Tokyo,14
Ritika,31,Delhi,11
Vinod,33,Mumbai,13
Saurav,31,Sydney,13
Lucy,32,Paris,13

It has employees’ data like their name, age, city, and experience. Now we want to create a Pandas Dataframe object using this CSV file. For that, first we will import the pandas module as pd i.e.

import pandas as pd

pd is an alias to the pandas.

Pandas module provides a function read_csv(), it takes the csv file path or name as argument and imports the content of a csv file into a Dataframe object. We are going to use this to create Dataframe. For example,

import pandas as pd

# Load the csv file and create a DataFrame object
df = pd.read_csv('employees.csv')

# Display the DataFrame
print(df)

Output:

     Name  Age      City  Experience
0    John   29    London          15
1    Mark   24  New York          13
2  Joseph   28     Tokyo          14
3  Ritika   31     Delhi          11
4   Vinod   33    Mumbai          13
5  Saurav   31    Sydney          13
6    Lucy   32     Paris          13

We called the read_csv() function and passed the CSV file name as an argument in it. The read_csv() function loads the CSV file and returns a dataframe object populated with that content. Then we printed the contents of the DataFrame.

A Dataframe stores the content in a tabular format, which means that our data is organized in rows and columns. As we have created the Dataframe object from the csv file, therefore the first row of our csv file was used as column labels. Dataframe provides various functions to select the content from this dataframe. We can select a single row or column from the DataFrame or a sub-set of this dataframe and perform various operations on it. We will discuss that later in this series.

There are other ways as well to create a Dataframe object. Like we can create a DataFrame from a dictionary of lists too.

Create DataFrame from dictionary and lists

Pandas module provides a function Dataframe(). In one of its overloaded implementation, it accepts a dictionary of lists as an argument. Each key-value pair of this dictionary contains the contents of a column. It means that the key acts as the column label, and the value is a list object, which includes the values of that particular column. It returns a dataframe object populated with all the provided values.

Let’s see some practical examples,

First of all, import the pandas module as pd and create a dictionary that contains the column names and their values. The dictionary should contain the information about employees. Then use this dictionary to create a Dataframe object i.e.

import pandas as pd

# Create a dictionary of lists
employees = { 'Name': ['John', 'Mark', 'Joseph', 'Ritika', 'Vinod', 'Saurav', 'Lucy'],
              'Age': [29, 24, 28, 31, 33, 32, 31],
              'City': ['London', 'Tokyo', 'Delhi', 'Mumbai', 'Sydney', 'Paris', 'New York'],
              'Experience': [15, 13, 14, 11, 13, 12, 15]}

# Create a Pandas DataFrame from a list of Dictionaries
df = pd.DataFrame(employees)

# Display the DataFrame
print(df)

Output

     Name  Age      City  Experience
0    John   29    London          15
1    Mark   24     Tokyo          13
2  Joseph   28     Delhi          14
3  Ritika   31    Mumbai          11
4   Vinod   33    Sydney          13
5  Saurav   32     Paris          12
6    Lucy   31  New York          15

We passed the dictionary to Dataframe() function, and it returned a Dataframe object filled with provided values.

Summary

We learned about the basic of DataFrame and how to create a Pandas DataFrame.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top