Pandas Tutorial #1 – Data Analysis with Python

This is the first part of Pandas tutorial series. In this tutorial we will learn,

Data Science and Machine Learning rely on data; therefore, data is the new oil nowadays. We can not directly use the raw data for analysis and creating machine models. We need to load, process, and make it ready for analysis. Then we also need efficient APIs for analysis and applying machine learning models to it. Python provides a few modules, i.e., NumPy and Pandas, for data processing to make all this scientific and analytics stuff possible. Also, Matplotlib for Data Visualization. These modules help users to manipulate, transform and visualize data efficiently.

This tutorial series will focus on Pandas, and later we will learn about NumPy and Matplotlib in separate tutorial series. Let’s start with the pandas first.

What is Pandas?

Python provides the Pandas module for high-performance data analysis.

Why do we need Pandas?

It is a fast, flexible, and powerful data manipulation library. Pandas is the most crucial module for applying Data Science using Python Programming. It provides several data structures like Series, Index, and DataFrame for data analysis. It provides the support for,

  • Easy import and export of data into a tabular format data structure like DataFrame.
  • Routines for manipulation and complex analyses of data.
  • Handling of Missing Data
  • Dataset merging
  • Reshaping of datasets
  • Time-series based data manipulation and analysis APIs
  • Group-By functionality to perform split-apply-combine operations
  • Well integrated with other libraries like NumPy and matplotlib

Pandas mainly provide two data structures for data manipulating and analysis. They are:

  • Series
    • A heterogeneous one dimensional labelled array. It contains a sequence of values of any data type.
  • DataFrame
    • A heterogeneous n-dimensional labelled data structure. In most cases, it is used as a two-dimensional tabular format. It stores the data in rows and columns. Both the rows and columns have labels associated with them. It is similar to excel, and just like in excel, we can perform all kinds of operations on data in DataFrame through provided APIs.

We will discuss each of them in detail later.

How to Install Pandas

To install the Pandas module, run the following command,

pip install pandas

If Python and pip are already installed, this command will install the pandas module.

How to check the version of installed Pandas?

To know the installed pandas version, run the following command,

pip show pandas

It will show the installed version of pandas, like this,

Name: pandas
Version: 1.0.5
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
License: BSD
Location: c:\python\python37\lib\site-packages
Requires: numpy, pytz, python-dateutil

In the next part of this series, we will start learning about the robust data structures offered by Pandas.

Summary:

In this part, we introduced the libraries provided by Python for Data Analysis and Manipulation.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top