A NumPy array is the most common object used to store information. We can convert these NumPy arrays to a Pandas DataFrame using the pandas.DataFrame class, which will be discussed in this article.
Table of Contents
Let’s get started on the different scenarios.
Converting NumPy Array to Pandas DataFrame
Firstly, let’s create a random array using the numpy library. Below we are creating an array with dimensions of (4,5) with some random values generated using np.random.rand() function.
import numpy as np np.random.seed(0) # creating an array with random values array = np.random.rand(4, 5) print (array)
Output
[[0.5488135 0.71518937 0.60276338 0.54488318 0.4236548 ] [0.64589411 0.43758721 0.891773 0.96366276 0.38344152] [0.79172504 0.52889492 0.56804456 0.92559664 0.07103606] [0.0871293 0.0202184 0.83261985 0.77815675 0.87001215]]
Now, let’s try to convert this NumPy array into a pandas DataFrame using the DataFrame constructor.
import numpy as np import pandas as pd np.random.seed(0) # creating an array with random values array = np.random.rand(4, 5) # converting array to DataFrame pandasDf = pd.DataFrame(array) print (pandasDf)
Output
Frequently Asked:
0 1 2 3 4 0 0.548814 0.715189 0.602763 0.544883 0.423655 1 0.645894 0.437587 0.891773 0.963663 0.383442 2 0.791725 0.528895 0.568045 0.925597 0.071036 3 0.087129 0.020218 0.832620 0.778157 0.870012
Here you go, you have converted the numpy array to a pandas DataFrame. In case, you want to add the column names, you can easily add them using the “columns” argument in the constructor class.
import numpy as np import pandas as pd np.random.seed(0) # creating an array with random values array = np.random.rand(4, 5) # converting array to DataFrame with column and row names pandasDf = pd.DataFrame(array, columns = ['col1', 'col2', 'col3', 'col4', 'col5']) print (pandasDf)
Output
col1 col2 col3 col4 col5 0 0.548814 0.715189 0.602763 0.544883 0.423655 1 0.645894 0.437587 0.891773 0.963663 0.383442 2 0.791725 0.528895 0.568045 0.925597 0.071036 3 0.087129 0.020218 0.832620 0.778157 0.870012
Similarly, we can also add the row indexes by using the “index” argument in the constructor class.
import numpy as np import pandas as pd np.random.seed(0) # creating an array with random values array = np.random.rand(4, 5) # converting array to DataFrame with column and row names pandasDf = pd.DataFrame(array, columns = ['col1', 'col2', 'col3', 'col4', 'col5'], index = ['row1', 'row2', 'row3', 'row4']) print (pandasDf)
Output
col1 col2 col3 col4 col5 row1 0.548814 0.715189 0.602763 0.544883 0.423655 row2 0.645894 0.437587 0.891773 0.963663 0.383442 row3 0.791725 0.528895 0.568045 0.925597 0.071036 row4 0.087129 0.020218 0.832620 0.778157 0.870012
The output shows the DataFrame with the specified column headers and row indexes.
Concat multiple NumPy Arrays to Pandas DataFrame
In case we have multiple NumPy arrays, which we want to concat into a pandas DataFrame, we can use the zip function and DataFrame constructor together. Let’s quickly experiment with sample data.
import numpy as np import pandas as pd # create two arrays array1 = np.array(['India', 'US', 'India', 'UAE', 'US']) array2 = np.array([1,2,3,4,5]) # zip into a single DataFrame df = pd.DataFrame(list(zip(array1, array2))) print (df)
Output
0 1 0 India 1 1 US 2 2 India 3 3 UAE 4 4 US 5
Here, we can again specify the column names and row indexes if required. One of the advantages of using this approach is that it stores the original dtypes of the array while converting it into a DataFrame.
print(df.info())
Output
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 0 5 non-null object 1 1 5 non-null int64 dtypes: int64(1), object(1) memory usage: 208.0+ bytes None
As observed above, the first column is stored as object while the second column is stored as int type.
Summary
Great, you made it! In this article, we have discussed multiple ways to convert a numpy array to the pandas DataFrame.