In this article, we will discuss multiple methods to load data from a text file to pandas DataFrame. We will try to cover multiple situations as well to take care while loading these text files into DataFrame.
Table of Contents
Introduction
To quickly get started, let’s say we have a text file named “sample.txt”, the contents of the file look like the below.
Name Designation Team Experience Shubham Data_Scientist Tech 5 Riti Data_Engineer Tech 7 Shanky Program_Manager Tech 2 Shreya Graphic_Designer Design 2 Aadi Backend_Developer Tech 11 Sim Data_Engineer Design 4
Load txt file to DataFrame using pandas.read_csv() method
The “read_csv” function from pandas is the most commonly used to read any CSV file. However, it can be used to read any text file as well. We just need to redefine a few attributes of the function to read the text files. Let’s quickly try below to read our “sample.txt” file.
import pandas as pd # read txt file using read_csv function df = pd.read_csv("sample.txt", sep="\s+") print(df)
Output
Name Designation Team Experience 0 Shubham Data_Scientist Tech 5 1 Riti Data_Engineer Tech 7 2 Shanky Program_Manager Tech 2 3 Shreya Graphic_Designer Design 2 4 Aadi Backend_Developer Tech 11 5 Sim Data_Engineer Design 4
As observed, we have used pandas.read_csv() function with an additional attribute called separator (“sep”). By default, it takes a comma as the separator to read CSV (comma separated) files, while we can change it to tab (“\t”) or spaces (“\s+”) to read other text files.
In case, we don’t have a header defined in the text file, we can use the header attribute as “None”, otherwise it will by default use the first row as a header.
Frequently Asked:
- Pandas: Get last row of dataframe
- Replace empty strings in a pandas DataFrame with NaN
- Replace column values based on conditions in Pandas
- Pandas Tutorial #9 – Filter DataFrame Rows
import pandas as pd # read txt file using read_csv function # note that we have used skiprows to skip the header df = pd.read_csv("sample.txt", sep="\s+", header=None, skiprows=1) print(df)
Output
0 1 2 3 0 Shubham Data_Scientist Tech 5 1 Riti Data_Engineer Tech 7 2 Shanky Program_Manager Tech 2 3 Shreya Graphic_Designer Design 2 4 Aadi Backend_Developer Tech 11 5 Sim Data_Engineer Design 4
Now, to define new column headers while reading the file, we can use the attribute “names” in the function below.
import pandas as pd # read txt file using read_csv function # adding new column names while reading txt file df = pd.read_csv( "sample.txt", sep="\s+", header=None, skiprows=1, names = ["col1", "col2", "col3", "col4"]) print(df)
Output
col1 col2 col3 col4 0 Shubham Data_Scientist Tech 5 1 Riti Data_Engineer Tech 7 2 Shanky Program_Manager Tech 2 3 Shreya Graphic_Designer Design 2 4 Aadi Backend_Developer Tech 11 5 Sim Data_Engineer Design 4
We have the new column names defined while reading the txt file.
Load txt file to DataFrame using pandas.read_table() method
The read_table() function from pandas is similar to the read_csv() function. Let’s quickly see a quick example to load the same txt file using the read_table function.
import pandas as pd # read txt file using read_table function df = pd.read_table("sample.txt", sep="\s+") print(df)
Output
Name Designation Team Experience 0 Shubham Data_Scientist Tech 5 1 Riti Data_Engineer Tech 7 2 Shanky Program_Manager Tech 2 3 Shreya Graphic_Designer Design 2 4 Aadi Backend_Developer Tech 11 5 Sim Data_Engineer Design 4
Load txt file to a DataFrame using pandas.read_fwf() method
There can be multiple challenges while reading the text files. A few common includes cases such as either the column values also contain spaces or there is no common delimiter in the text file.
Let’s look at the example below, where our sample.txt file where the column “Designation” contains values with spaces between them.
Name Designation Team Experience Shubham Data Scientist Tech 5 Riti Data Engineer Tech 7 Shanky Program Manager Tech 2 Shreya Graphic Designer Design 2 Aadi Backend Developer Tech 11 Sim Data Engineer Design 4
In such cases, if we use the read_csv function, it will result in separating that column into multiple columns as shown below.
import pandas as pd # read txt file using read_csv function df = pd.read_csv("sample.txt", sep="\s+") print(df)
Output
Name Designation Team Experience Shubham Data Scientist Tech 5 Riti Data Engineer Tech 7 Shanky Program Manager Tech 2 Shreya Graphic Designer Design 2 Aadi Backend Developer Tech 11 Sim Data Engineer Design 4
This is where, the read_fwf function from pandas comes in handy, which loads the width-formatted text files into pandas DataFrame easily. Let’s quickly try below.
import pandas as pd # read txt file using read_table function df = pd.read_fwf("sample.txt") print(df)
Output
Name Designation Team Experience 0 Shubham Data Scientist Tech 5 1 Riti Data Engineer Tech 7 2 Shanky Program Manager Tech 2 3 Shreya Graphic Designer Design 2 4 Aadi Backend Developer Tech 11 5 Sim Data Engineer Design 4
Here you go, we don’t even need to provide any separator in this case as it works on a fixed width-based separator.
Summary
In this article, we have discussed multiple ways to load a text file into DataFrame using pandas. Thanks.