How to plot a correlation matrix in pandas?

In this article we will discuss multiple ways to plot a correlation matrix in pandas.

Table of Contents

Preparing DataSet

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# DataFrame with some random values
df = pd.DataFrame(np.random.randint(0,100, size=(100, 6)), columns=list('ABCDEF'))

print(df.head())

Contents of the created dataframe are,

    A   B   C   D   E   F
0   3  38  71  80  71  68
1  80  15  45  51  29  87
2   0  72  35  37  52  49
3  67  21  28  43  53  57
4  44  67  14  47  64  30

A correlation matrix is generally used to visualize the correlation coefficients between all the features in a DataFrame. To get the correlation matrix, we can simply use the “corr” function on the pandas DataFrame.

Advertisements
print(df.corr())

Output

          A         B         C         D         E         F
A  1.000000 -0.121004 -0.028870  0.081519 -0.082788  0.007588
B -0.121004  1.000000  0.137948  0.186861  0.072054 -0.042191
C -0.028870  0.137948  1.000000  0.105994 -0.015434  0.010137
D  0.081519  0.186861  0.105994  1.000000  0.027067  0.105773
E -0.082788  0.072054 -0.015434  0.027067  1.000000 -0.003142
F  0.007588 -0.042191  0.010137  0.105773 -0.003142  1.000000

Here you have the correlation coefficients for all the feature combinations. Obviously, it is a little difficult to interpret, which is why visualizing this matrix can help understand the insights better.

Styling the correlation matrix directly

The simplest way to visualize the correlation matrix is to directly color-code the above matrix. We are going to the style attribute to add some background gradient.

# storing the correlation matrix
corr = df.corr()

# adding background gradient
corr.style.background_gradient(cmap='coolwarm')

Output

Adding a background gradient makes it slightly easier to read, as the dark blue color shows more negatively correlated features while the lighter shades show more positively correlated features. We can play around with these gradients using the cmap attribute.

Using matplotlib plotting library

Matplotlib is the standard library in python for all visualization methods. We are going to use it for plotting the correlation matrix as below.

# import
import matplotlib.pyplot as plt

# set figure size
f = plt.figure(figsize=(8, 8))

# using matshow
plt.matshow(df.corr(), fignum=f.number)

# adding color scale
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)

# print
plt.show()

Output

As observed, we have similar output as the above method where the darker blue shade shows a more negative correlation and the light blue color shows a more positive correlation.

Using Seaborn heatmaps

Another easier way to plot the correlation matrix is to use the heatmaps from the seaborn library. Heatmaps, as the name suggests, are a graphical representation of data where values are depicted by color. Let’s plot the correlation matrix below.

# import
import seaborn as sns

# heatmap using seaborn
sns.heatmap(df.corr(), annot=True)

Output

As observed, this also gives us a similar output with a clean representation with values (annotations) as well.

Summary

In this article, we have discussed multiple ways to plot the correlation matrix in pandas.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top