Python Pandas : How to convert lists to a dataframe
In this article we will discuss how to convert a single or multiple lists to a DataFrame.
Python’s pandas library provide a constructor of DataFrame to create a Dataframe by passing objects i.e.
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Here data parameter can be a numpy ndarray , dict, or an other DataFrame. Also, columns and index are for column and index labels.
Let’s use this to convert lists to dataframe object from lists.
Create DataFrame from list of lists
Suppose we have a list of lists i.e.
# List of lists students = [ ['jack', 34, 'Sydeny'] , ['Riti', 30, 'Delhi' ] , ['Aadi', 16, 'New York'] ]
Pass this list to DataFrame’s constructor to create a dataframe object i.e.
# Creating a dataframe object from listoftuples dfObj = pd.DataFrame(students)
Contents of the created DataFrames are as follows,
0 1 2 0 jack 34 Sydeny 1 Riti 30 Delhi 2 Aadi 16 New York
Create DataFrame from lists of tuples
Just like list of lists we can pass list of tuples in dataframe contsructor to create a dataframe.
Suppose we have a list of tuples i.e.
# List of Tuples students = [ ('jack', 34, 'Sydeny') , ('Riti', 30, 'Delhi' ) , ('Aadi', 16, 'New York') ]
Pass this list of tuples to DataFrame’s constructor to create a DataFrame object i.e.
# Creating a dataframe object from listoftuples dfObj = pd.DataFrame(students)
Contents of the created dataframe is as follows,
0 1 2 0 jack 34 Sydeny 1 Riti 30 Delhi 2 Aadi 16 New York
Both Column & Index labels are default. But we can also provide them i.e.
Create a Dataframe from list and set column names and indexes
#Convert list of tuples to dataframe and set column names and indexes dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City'], index=['a', 'b', 'c'])
Contents of the created dataframe is as follows,
Name Age City a jack 34 Sydeny b Riti 30 Delhi c Aadi 16 New York
Create dataframe from list of tuples and skip certain columns
In out list of tuples we have 3 entries in each tuple. What if we want to use 1st and 3rd entry only?
Let’s create a dataframe by skipping 2nd entry in tuples i.e.
# Create datafrae from student list but skip column 'Age' i.e. only with 2 columns dfObj = pd.DataFrame.from_records(students, exclude=['Age'], columns = ['Name' , 'Age', 'City'], index=['a', 'b', 'c'])
Contents of the created dataframe is as follows,
Name City a jack Sydeny b Riti Delhi c Aadi New York
Create dataframe from multiple lists
Suppose we have 3 different lists i.e.
listOfNames = ['jack', 'Riti', 'Aadi'] listOfAge = [34, 30, 16] listOfCity = ['Sydney', 'Delhi', 'New york']
Now we want to conver them to a dataframe with each lists as a column. Let’s see how to do that i.e.
Zip the lists to create a list of tuples i.e.
# Create a zipped list of tuples from above lists zippedList = list(zip(listOfNames, listOfAge, listOfCity))
Contents of ziipledLists is,
[('jack', 34, 'Sydney'), ('Riti', 30, 'Delhi'), ('Aadi', 16, 'New york')]
Let’s create a dataframe with this zipped lists i.e.
# Create a dataframe from zipped list dfObj = pd.DataFrame(zippedList, columns = ['Name' , 'Age', 'City'], index=['a', 'b', 'c'])
Contents of the created dataframe is as follows,
Name Age City a jack 34 Sydney b Riti 30 Delhi c Aadi 16 New york
Complete example is as follows,
import pandas as pd def main(): # List of lists students = [ ['jack', 34, 'Sydeny'] , ['Riti', 30, 'Delhi' ] , ['Aadi', 16, 'New York'] ] print("****Create a Dataframe from list of lists *****") # Creating a dataframe object from listoftuples dfObj = pd.DataFrame(students) print("Dataframe : " , dfObj, sep='\n') # List of Tuples students = [ ('jack', 34, 'Sydeny') , ('Riti', 30, 'Delhi' ) , ('Aadi', 16, 'New York') ] print("****Create a Dataframe from list of tuple *****") # Creating a dataframe object from listoftuples dfObj = pd.DataFrame(students) print("Dataframe : " , dfObj, sep='\n') print("****Create a Dataframe from list of tuple, also set column names and indexes *****") #Convert list of tuples to dataframe and set column names and indexes dfObj = pd.DataFrame(students, columns = ['Name' , 'Age', 'City'], index=['a', 'b', 'c']) print("Dataframe : " , dfObj, sep='\n') print("****Create dataframe from list of tuples and skip certain columns*********") # Create datafrae from student list but skip column 'Age' i.e. only with 2 columns dfObj = pd.DataFrame.from_records(students, exclude=['Age'], columns = ['Name' , 'Age', 'City'], index=['a', 'b', 'c']) print("Dataframe : " , dfObj, sep='\n') print("***Create dataframe from multiple lists***") listOfNames = ['jack', 'Riti', 'Aadi'] listOfAge = [34, 30, 16] listOfCity = ['Sydney', 'Delhi', 'New york'] # Create a zipped list of tuples from above lists zippedList = list(zip(listOfNames, listOfAge, listOfCity)) print("zippedList = " , zippedList) # Create a dataframe from zipped list dfObj = pd.DataFrame(zippedList, columns = ['Name' , 'Age', 'City'], index=['a', 'b', 'c']) print("Dataframe : " , dfObj, sep='\n') if __name__ == '__main__': main()
Output:
****Create a Dataframe from list of lists ***** Dataframe : 0 1 2 0 jack 34 Sydeny 1 Riti 30 Delhi 2 Aadi 16 New York ****Create a Dataframe from list of tuple ***** Dataframe : 0 1 2 0 jack 34 Sydeny 1 Riti 30 Delhi 2 Aadi 16 New York ****Create a Dataframe from list of tuple, also set column names and indexes ***** Dataframe : Name Age City a jack 34 Sydeny b Riti 30 Delhi c Aadi 16 New York ****Create dataframe from list of tuples and skip certain columns********* Dataframe : Name City a jack Sydeny b Riti Delhi c Aadi New York ***Create dataframe from multiple lists*** zippedList = [('jack', 34, 'Sydney'), ('Riti', 30, 'Delhi'), ('Aadi', 16, 'New york')] Dataframe : Name Age City a jack 34 Sydney b Riti 30 Delhi c Aadi 16 New york
Leave a Reply