This is the first part of Pandas tutorial series. In this tutorial we will learn,
- What is Pandas in Python?
- Why do we need Pandas in Python?
- How to install Pandas?
- How to check the version of installed Pandas?
Data Science and Machine Learning rely on data; therefore, data is the new oil nowadays. We can not directly use the raw data for analysis and creating machine models. We need to load, process, and make it ready for analysis. Then we also need efficient APIs for analysis and applying machine learning models to it. Python provides a few modules, i.e., NumPy and Pandas, for data processing to make all this scientific and analytics stuff possible. Also, Matplotlib for Data Visualization. These modules help users to manipulate, transform and visualize data efficiently.
This tutorial series will focus on Pandas, and later we will learn about NumPy and Matplotlib in separate tutorial series. Let’s start with the pandas first.
What is Pandas?
Python provides the Pandas module for high-performance data analysis.
Why do we need Pandas?
It is a fast, flexible, and powerful data manipulation library. Pandas is the most crucial module for applying Data Science using Python Programming. It provides several data structures like Series, Index, and DataFrame for data analysis. It provides the support for,
- Easy import and export of data into a tabular format data structure like DataFrame.
- Routines for manipulation and complex analyses of data.
- Handling of Missing Data
- Dataset merging
- Reshaping of datasets
- Time-series based data manipulation and analysis APIs
- Group-By functionality to perform split-apply-combine operations
- Well integrated with other libraries like NumPy and matplotlib
Pandas mainly provide two data structures for data manipulating and analysis. They are:
- A heterogeneous one dimensional labelled array. It contains a sequence of values of any data type.
- A heterogeneous n-dimensional labelled data structure. In most cases, it is used as a two-dimensional tabular format. It stores the data in rows and columns. Both the rows and columns have labels associated with them. It is similar to excel, and just like in excel, we can perform all kinds of operations on data in DataFrame through provided APIs.
We will discuss each of them in detail later.
- Replace NaN with zero in a column in Pandas
- How to convert Dataframe column type from string to date time
- Pandas : Check if a value exists in a DataFrame using in & not in operator | isin()
- Pandas: Delete first column of dataframe in Python
How to Install Pandas
To install the Pandas module, run the following command,
pip install pandas
If Python and pip are already installed, this command will install the pandas module.
How to check the version of installed Pandas?
To know the installed pandas version, run the following command,
pip show pandas
It will show the installed version of pandas, like this,
Name: pandas Version: 1.0.5 Summary: Powerful data structures for data analysis, time series, and statistics Home-page: https://pandas.pydata.org License: BSD Location: c:\python\python37\lib\site-packages Requires: numpy, pytz, python-dateutil
In the next part of this series, we will start learning about the robust data structures offered by Pandas.
In this part, we introduced the libraries provided by Python for Data Analysis and Manipulation.