Create a Pandas DataFrame from a Numpy Array

A NumPy array is the most common object used to store information. We can convert these NumPy arrays to a Pandas DataFrame using the pandas.DataFrame class, which will be discussed in this article.

Table of Contents

Let’s get started on the different scenarios.

Converting NumPy Array to Pandas DataFrame

Firstly, let’s create a random array using the numpy library. Below we are creating an array with dimensions of (4,5) with some random values generated using np.random.rand() function.

import numpy as np

np.random.seed(0)

# creating an array with random values
array = np.random.rand(4, 5)

print (array)

Output

[[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548 ]
 [0.64589411 0.43758721 0.891773   0.96366276 0.38344152]
 [0.79172504 0.52889492 0.56804456 0.92559664 0.07103606]
 [0.0871293  0.0202184  0.83261985 0.77815675 0.87001215]]

Now, let’s try to convert this NumPy array into a pandas DataFrame using the DataFrame constructor.

import numpy as np
import pandas as pd

np.random.seed(0)

# creating an array with random values
array = np.random.rand(4, 5)

# converting array to DataFrame
pandasDf = pd.DataFrame(array)

print (pandasDf)

Output

          0         1         2         3         4
0  0.548814  0.715189  0.602763  0.544883  0.423655
1  0.645894  0.437587  0.891773  0.963663  0.383442
2  0.791725  0.528895  0.568045  0.925597  0.071036
3  0.087129  0.020218  0.832620  0.778157  0.870012

Here you go, you have converted the numpy array to a pandas DataFrame. In case, you want to add the column names, you can easily add them using the “columns” argument in the constructor class.

import numpy as np
import pandas as pd

np.random.seed(0)

# creating an array with random values
array = np.random.rand(4, 5)

# converting array to DataFrame with column and row names
pandasDf = pd.DataFrame(array, 
                        columns = ['col1', 'col2', 'col3', 'col4', 'col5'])
print (pandasDf)

Output

       col1      col2      col3      col4      col5
0  0.548814  0.715189  0.602763  0.544883  0.423655
1  0.645894  0.437587  0.891773  0.963663  0.383442
2  0.791725  0.528895  0.568045  0.925597  0.071036
3  0.087129  0.020218  0.832620  0.778157  0.870012

Similarly, we can also add the row indexes by using the “index” argument in the constructor class.

import numpy as np
import pandas as pd

np.random.seed(0)

# creating an array with random values
array = np.random.rand(4, 5)

# converting array to DataFrame with column and row names
pandasDf = pd.DataFrame(array, 
                        columns = ['col1', 'col2', 'col3', 'col4', 'col5'],
                        index = ['row1', 'row2', 'row3', 'row4'])
print (pandasDf)

Output

          col1      col2      col3      col4      col5
row1  0.548814  0.715189  0.602763  0.544883  0.423655
row2  0.645894  0.437587  0.891773  0.963663  0.383442
row3  0.791725  0.528895  0.568045  0.925597  0.071036
row4  0.087129  0.020218  0.832620  0.778157  0.870012

The output shows the DataFrame with the specified column headers and row indexes.

Concat multiple NumPy Arrays to Pandas DataFrame

In case we have multiple NumPy arrays, which we want to concat into a pandas DataFrame, we can use the zip function and DataFrame constructor together. Let’s quickly experiment with sample data.

import numpy as np
import pandas as pd

# create two arrays
array1 = np.array(['India', 'US', 'India', 'UAE', 'US'])
array2 = np.array([1,2,3,4,5])

# zip into a single DataFrame
df = pd.DataFrame(list(zip(array1, array2)))

print (df)

Output

       0  1
0  India  1
1     US  2
2  India  3
3    UAE  4
4     US  5

Here, we can again specify the column names and row indexes if required. One of the advantages of using this approach is that it stores the original dtypes of the array while converting it into a DataFrame.

print(df.info())

Output

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       5 non-null      object
 1   1       5 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 208.0+ bytes
None

As observed above, the first column is stored as object while the second column is stored as int type.

Summary

Great, you made it! In this article, we have discussed multiple ways to convert a numpy array to the pandas DataFrame.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top