How to extract a tar file in Python?

In this article, we will learn how to extract a tar file in python. we will be discussing multiple approaches to extract a tar file.

Table Of Contents

Introduction

Suppose we have a tar file with name test.tar. We will be extracting this tar file in this article.

File name: test.tar

The files present in the tar file are: file1.txt, file2.xlsx

Extracting a tar file using extractall() in Python

The tarfile module consists of the open() method, and it takes the path of the tar file as input and returns the TarFile object. The TarFile object has extractall() method, and as an argument it takes the path where the tar file contents need to be extracted.

Syntax of tarfile.open() method

tarfile.open(file_path, mode)
  • Parameters:
    • file_path: The path of the tar file.
    • mode: The mode in which the file needs to be opened. read (r), write (w), append (a). By default read mode.
  • Returns:
    • Returns a TarFile object.

Syntax of extractall() method

extractall(path)
  • Parameters:
    • path: The path where the tar file contents need to be extracted.
  • Returns:
    • None

Approach:

  1. Create a TarFile object of the tar file using the open() method.
  2. Call the extractall() method on the TarFile object by passing the path as a string.
  3. It extracts all the files present in the tar file in the specified path.

Source Code

import tarfile
import os

# creating the file object.
my_tar = tarfile.open('./test.tar')

# extract the tar file
my_tar.extractall('./extracted_tar')

# closing the file object
my_tar.close()

# listing files in extracted folder
print(os.listdir('./extracted_tar'))

Output:

['file1.txt', 'file2.xlsx']

As test.tar file was in the same directory, therefore we gave the path as ./test.tar. If your tar file is in some other directory, then you need to give either relative or absolute path of it.

Extracting a tar file using extract() method

The tarfile module consists of the extract() method, which is used to extract a single file from the tar file. It takes the name of the file to be extracted from the tar file and the path where the file needs to be extracted as input arguments. To extract all the files present in the tar file, extract each file one by one. To get the names of files present in the tar file we can use thegetnames() method.

Syntax of getnames() method

getnames()
  • Parameters:
    • None
  • Returns:
    • Return the list of file names.

Syntax of extract() method

extract(file_name, path)
  • Parameters:
    • path: The path where the file needs to be extracted.
    • file_name: name of the file which needs to be extracted from the tar file.
  • Returns:
    • None

Approach:

  1. Create a TarFile object of the tar file using the open() method.
  2. Get the names of all the files present in the tar file using the getnames() method.
  3. Call the extract() method on the TarFile object by passing the file name and path.
  4. Repeat step 3, for all the file names present in the list.

Source Code

import tarfile
import os

# creating the file object.
my_tar = tarfile.open('./test.tar')

# extract the tar file
for file_name in my_tar.getnames():
    my_tar.extract(file_name, './extracted_tar')

# listing files in extracted folder
print(os.listdir('./extracted_tar'))

# closing the file object
my_tar.close()

Output:

['file1.txt', 'file2.xlsx']

Extracting a tar file using _extract_member() and getmembers()

The tarfile module consists of the _extract_member() method, it is used to extract a single file from the tar file. It takes TarInfo object of the file to be extracted from the tar file and the path where the file needs to be extracted as input arguments. The TarInfo object of the files can be created using the getmembers() method. To extract all the files present in the tar file, extract each file one by one.

Syntax of getmembers() method

getmembers()
  • Parameters:
    • None
  • Returns:
    • Return the members of the archive as a list of TarInfo objects. The list has the same order as the members in the archive.

Syntax of _extract_member() method

_extract_member(member, path)
  • Parameters:
    • member: The TarInfo object of the file.
    • path: The path where the file needs to be extracted.
  • Returns:
    • None

Approach:

  1. Create a TarFile object of the tar file using the open() method.
  2. Get the TarInfo objects of all the files present in the tar file using the getmembers() method.
  3. Call the _extract_memberract() method on the TarFile object by passing the TarInfo object and path.
  4. Repeat step 3, for all the files present in the tar file.

Source Code

import tarfile
import os

# creating the file object.
my_tar = tarfile.open('./test.tar')

# extract the tar file
for member in my_tar.getmembers():
    my_tar._extract_member(member,'./extracted_tar/'+member.name)

# listing files in extracted folder
print(os.listdir('./extracted_tar'))

# closing the file object
my_tar.close()

Output:

['file1.txt', 'file2.xlsx']

Extracting a tar file using extractfile() method

The tarfile module consists of the extractfile() method, which is used to extract a single file from the tar file. It takes the name of the file to be extracted from the tar file as an input argument and returns io.BufferedReader object. Write the BufferedReader returned by the extractfile() method into a file. To extract all the files present in the tar file, extract each file one by one.

Syntax of extractfile() method

extractfile(file_name)
  • Parameters:
    • file_name: name of the file which needs to be extracted from the tar file.
  • Returns:
    • None

Approach:

  1. Create a TarFile object of the tar file using the open() method.
  2. Create a new folder for extracting all the files in tar using the mkdir().
  3. Set the new folder path as the current working directory using the chdir().
  4. Get the names of all the files present in the tar file using the getnames() method.
  5. Call the extractfile() method on the TarFile object by passing the file name and path.
  6. Write the bytes returned by extractfile() into a file using the write() method.
  7. Repeat steps 5 and 6, for all the file names present in the list.

Source Code

import os
import tarfile

# creating the file object.
my_tar = tarfile.open('./test.tar')

os.mkdir('./extracted_tar')
os.chdir('./extracted_tar')

# extract the tar file
for file_name in my_tar.getnames()[1:]:
    # As getnames() would return something like this ['.'. 'file1.txt', 'file2.xlsx']
    # we are ignoring the first name in the list.
    file_io_bytes = my_tar.extractfile(file_name)
    with open(file_name, "wb") as f:
        f.write(file_io_bytes.read())

Output:

['file1.txt', 'file2.xlsx']

Summary

Great! you made it, We have discussed all possible methods to extract a tar file in python. Happy learning.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top