Import multiple csv files into one DataFrame in Pandas

In this article, we will discuss different methods to load multiple csv files into one pandas DataFrame.

Table Of Contents

Preparing DataSet
Method 1: Using For Loop
Method 2: Using the map() function
Method 3: Using the dask library
Summary

Preparing DataSet

To get started, we have created a few sample csv files in the current working directory. Below is a quick snippet of all the csv files.

df1.csv

Name,City,Team
Shubham,Bangalore,Tech
Rudra,Mumbai,Product

df2.csv

Name,City,Team
Adarsh,Bangalore,Tech
Ajay,Delhi,Tech

df3.csv

Name,City,Team
Shreya,Jaipur,Design
Sam,Delhi,Design

Let’s look at multiple methods to read all these csv files in a single pandas DataFrame.

Frequently Asked:

Method 1: Using For Loop

The easiest method to execute any repetitive task is for loop. We can iteratively read all the csv files and then append them in a single DataFrame. Let’s try to understand using the code below.

import pandas as pd

# list of files to read
files = ["df1.csv", "df2.csv", "df3.csv"]

# create a empty DataFrame where we will append all the DataFrames
final_df = pd.DataFrame()

for file in files:
    # read and append the file
    final_df = pd.concat([final_df, pd.read_csv(file)], axis=0)

final_df = final_df.reset_index(drop=True)

print (final_df)

Output

      Name       City     Team
0  Shubham  Bangalore     Tech
1    Rudra     Mumbai  Product
2   Adarsh  Bangalore     Tech
3     Ajay      Delhi     Tech
4   Shreya     Jaipur   Design
5      Sam      Delhi   Design

As observed, the outputs of all the files are now appended in a single DataFrame (“final_df”).

Method 2: Using the map() function

Using the for loops is not a very efficient method of executing things. In this approach, we are going to replace the entire for loop with the map function.

import pandas as pd

# using the map function
final_df = pd.concat(map(pd.read_csv, ['df1.csv', 'df2.csv','df3.csv']))

final_df = final_df.reset_index(drop=True)

print (final_df)

Output

      Name       City     Team
0  Shubham  Bangalore     Tech
1    Rudra     Mumbai  Product
2   Adarsh  Bangalore     Tech
3     Ajay      Delhi     Tech
4   Shreya     Jaipur   Design
5      Sam      Delhi   Design

We have replaced that entire section of code with a single line using the map function, thus making the code efficient and clean.

Method 3: Using the dask library

Another efficient way is to use the dask library, which is far faster than the pandas. However, the syntax is very similar to the normal pandas, but the background functionality is much faster. Let’s take a look at the code below.

# import library
import dask.dataframe as dd

# read all csv files starting with "df"
df = dd.read_csv("df*.csv")

df = df.compute()

print(df)

Output

      Name       City     Team
0  Shubham  Bangalore     Tech
1    Rudra     Mumbai  Product
2   Adarsh  Bangalore     Tech
3     Ajay      Delhi     Tech
4   Shreya     Jaipur   Design
5      Sam      Delhi   Design

As observed, all the file’s output is now combined into a single DataFrame. We can convert it back to pandas DataFrame for further processes.

Summary

In this article, we have discussed multiple ways to import multiple csv files into one DataFrame in Pandas. Thanks.

Preparing DataSet

Frequently Asked:

Method 1: Using For Loop

Method 2: Using the map() function

Method 3: Using the dask library

Summary

Related posts:

Share your love

Leave a Comment Cancel Reply