In this article, we will discuss different methods to load multiple csv files into one pandas DataFrame.
Table Of Contents
Preparing DataSet
To get started, we have created a few sample csv files in the current working directory. Below is a quick snippet of all the csv files.
df1.csv
Name,City,Team Shubham,Bangalore,Tech Rudra,Mumbai,Product
df2.csv
Name,City,Team Adarsh,Bangalore,Tech Ajay,Delhi,Tech
df3.csv
Name,City,Team Shreya,Jaipur,Design Sam,Delhi,Design
Let’s look at multiple methods to read all these csv files in a single pandas DataFrame.
Method 1: Using For Loop
The easiest method to execute any repetitive task is for loop. We can iteratively read all the csv files and then append them in a single DataFrame. Let’s try to understand using the code below.
import pandas as pd # list of files to read files = ["df1.csv", "df2.csv", "df3.csv"] # create a empty DataFrame where we will append all the DataFrames final_df = pd.DataFrame() for file in files: # read and append the file final_df = pd.concat([final_df, pd.read_csv(file)], axis=0) final_df = final_df.reset_index(drop=True) print (final_df)
Output
Name City Team 0 Shubham Bangalore Tech 1 Rudra Mumbai Product 2 Adarsh Bangalore Tech 3 Ajay Delhi Tech 4 Shreya Jaipur Design 5 Sam Delhi Design
As observed, the outputs of all the files are now appended in a single DataFrame (“final_df”).
Method 2: Using the map() function
Using the for loops is not a very efficient method of executing things. In this approach, we are going to replace the entire for loop with the map function.
import pandas as pd # using the map function final_df = pd.concat(map(pd.read_csv, ['df1.csv', 'df2.csv','df3.csv'])) final_df = final_df.reset_index(drop=True) print (final_df)
Output
Name City Team 0 Shubham Bangalore Tech 1 Rudra Mumbai Product 2 Adarsh Bangalore Tech 3 Ajay Delhi Tech 4 Shreya Jaipur Design 5 Sam Delhi Design
We have replaced that entire section of code with a single line using the map function, thus making the code efficient and clean.
Method 3: Using the dask library
Another efficient way is to use the dask library, which is far faster than the pandas. However, the syntax is very similar to the normal pandas, but the background functionality is much faster. Let’s take a look at the code below.
# import library import dask.dataframe as dd # read all csv files starting with "df" df = dd.read_csv("df*.csv") df = df.compute() print(df)
Output
Name City Team 0 Shubham Bangalore Tech 1 Rudra Mumbai Product 2 Adarsh Bangalore Tech 3 Ajay Delhi Tech 4 Shreya Jaipur Design 5 Sam Delhi Design
As observed, all the file’s output is now combined into a single DataFrame. We can convert it back to pandas DataFrame for further processes.
Summary
In this article, we have discussed multiple ways to import multiple csv files into one DataFrame in Pandas. Thanks.