How to Load data from txt to DataFrame with pandas?

In this article, we will discuss multiple methods to load data from a text file to pandas DataFrame. We will try to cover multiple situations as well to take care while loading these text files into DataFrame.

Table of Contents

Introduction

To quickly get started, let’s say we have a text file named “sample.txt”, the contents of the file look like the below.

Name Designation Team Experience
Shubham Data_Scientist Tech 5
Riti Data_Engineer Tech 7
Shanky Program_Manager Tech 2
Shreya Graphic_Designer Design 2
Aadi Backend_Developer Tech 11
Sim Data_Engineer Design 4

Load txt file to DataFrame using pandas.read_csv() method

The “read_csv” function from pandas is the most commonly used to read any CSV file. However, it can be used to read any text file as well. We just need to redefine a few attributes of the function to read the text files. Let’s quickly try below to read our “sample.txt” file.

import pandas as pd

# read txt file using read_csv function
df = pd.read_csv("sample.txt", sep="\s+")

print(df)

Output

Advertisements
      Name        Designation    Team  Experience
0  Shubham     Data_Scientist    Tech           5
1     Riti      Data_Engineer    Tech           7
2   Shanky    Program_Manager    Tech           2
3   Shreya   Graphic_Designer  Design           2
4     Aadi  Backend_Developer    Tech          11
5      Sim      Data_Engineer  Design           4

As observed, we have used pandas.read_csv() function with an additional attribute called separator (“sep”). By default, it takes a comma as the separator to read CSV (comma separated) files, while we can change it to tab (“\t”) or spaces (“\s+”) to read other text files.

In case, we don’t have a header defined in the text file, we can use the header attribute as “None”, otherwise it will by default use the first row as a header.

import pandas as pd

# read txt file using read_csv function
# note that we have used skiprows to skip the header
df = pd.read_csv("sample.txt", sep="\s+", header=None, skiprows=1)

print(df)

Output

         0                  1       2   3
0  Shubham     Data_Scientist    Tech   5
1     Riti      Data_Engineer    Tech   7
2   Shanky    Program_Manager    Tech   2
3   Shreya   Graphic_Designer  Design   2
4     Aadi  Backend_Developer    Tech  11
5      Sim      Data_Engineer  Design   4

Now, to define new column headers while reading the file, we can use the attribute “names” in the function below.

import pandas as pd

# read txt file using read_csv function
# adding new column names while reading txt file
df = pd.read_csv(
            "sample.txt",
            sep="\s+",
            header=None,
            skiprows=1, 
            names = ["col1", "col2", "col3", "col4"])

print(df)

Output

      col1               col2    col3  col4
0  Shubham     Data_Scientist    Tech     5
1     Riti      Data_Engineer    Tech     7
2   Shanky    Program_Manager    Tech     2
3   Shreya   Graphic_Designer  Design     2
4     Aadi  Backend_Developer    Tech    11
5      Sim      Data_Engineer  Design     4

We have the new column names defined while reading the txt file.

Load txt file to DataFrame using pandas.read_table() method

The read_table() function from pandas is similar to the read_csv() function. Let’s quickly see a quick example to load the same txt file using the read_table function.

import pandas as pd

# read txt file using read_table function
df = pd.read_table("sample.txt", sep="\s+")

print(df)

Output

      Name        Designation    Team  Experience
0  Shubham     Data_Scientist    Tech           5
1     Riti      Data_Engineer    Tech           7
2   Shanky    Program_Manager    Tech           2
3   Shreya   Graphic_Designer  Design           2
4     Aadi  Backend_Developer    Tech          11
5      Sim      Data_Engineer  Design           4

Load txt file to a DataFrame using pandas.read_fwf() method

There can be multiple challenges while reading the text files. A few common includes cases such as either the column values also contain spaces or there is no common delimiter in the text file.

Let’s look at the example below, where our sample.txt file where the column “Designation” contains values with spaces between them.

    Name        Designation    Team  Experience
 Shubham     Data Scientist    Tech           5
    Riti      Data Engineer    Tech           7
  Shanky    Program Manager    Tech           2
  Shreya   Graphic Designer  Design           2
    Aadi  Backend Developer    Tech          11
     Sim      Data Engineer  Design           4

In such cases, if we use the read_csv function, it will result in separating that column into multiple columns as shown below.

import pandas as pd

# read txt file using read_csv function
df = pd.read_csv("sample.txt", sep="\s+")

print(df)

Output

            Name Designation    Team  Experience
Shubham     Data   Scientist    Tech           5
Riti        Data    Engineer    Tech           7
Shanky   Program     Manager    Tech           2
Shreya   Graphic    Designer  Design           2
Aadi     Backend   Developer    Tech          11
Sim         Data    Engineer  Design           4

This is where, the read_fwf function from pandas comes in handy, which loads the width-formatted text files into pandas DataFrame easily. Let’s quickly try below.

import pandas as pd

# read txt file using read_table function
df = pd.read_fwf("sample.txt")

print(df)

Output

      Name        Designation    Team  Experience
0  Shubham     Data Scientist    Tech           5
1     Riti      Data Engineer    Tech           7
2   Shanky    Program Manager    Tech           2
3   Shreya   Graphic Designer  Design           2
4     Aadi  Backend Developer    Tech          11
5      Sim      Data Engineer  Design           4

Here you go, we don’t even need to provide any separator in this case as it works on a fixed width-based separator.

Summary

In this article, we have discussed multiple ways to load a text file into DataFrame using pandas. Thanks.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top