In this article, we will learn how to extract a tar file in python. we will be discussing multiple approaches to extract a tar file.
Table Of Contents
Introduction
Suppose we have a tar file with name test.tar
. We will be extracting this tar file in this article.
File name: test.tar
The files present in the tar file are: file1.txt, file2.xlsx
Extracting a tar file using extractall() in Python
The tarfile
module consists of the open()
method, and it takes the path of the tar file as input and returns the TarFile
object. The TarFile
object has extractall()
method, and as an argument it takes the path where the tar file contents need to be extracted.
Syntax of tarfile.open() method
Frequently Asked:
tarfile.open(file_path, mode)
- Parameters:
- file_path: The path of the tar file.
- mode: The mode in which the file needs to be opened. read (r), write (w), append (a). By default read mode.
- Returns:
- Returns a TarFile object.
Syntax of extractall() method
extractall(path)
- Parameters:
- path: The path where the tar file contents need to be extracted.
- Returns:
- None
Approach:
- Create a TarFile object of the tar file using the open() method.
- Call the extractall() method on the TarFile object by passing the path as a string.
- It extracts all the files present in the tar file in the specified path.
Source Code
import tarfile import os # creating the file object. my_tar = tarfile.open('./test.tar') # extract the tar file my_tar.extractall('./extracted_tar') # closing the file object my_tar.close() # listing files in extracted folder print(os.listdir('./extracted_tar'))
Output:
['file1.txt', 'file2.xlsx']
As test.tar
file was in the same directory, therefore we gave the path as ./test.tar
. If your tar file is in some other directory, then you need to give either relative or absolute path of it.
Extracting a tar file using extract() method
The tarfile
module consists of the extract()
method, which is used to extract a single file from the tar file. It takes the name of the file to be extracted from the tar file and the path where the file needs to be extracted as input arguments. To extract all the files present in the tar file, extract each file one by one. To get the names of files present in the tar file we can use thegetnames()
method.
Syntax of getnames() method
getnames()
- Parameters:
- None
- Returns:
- Return the list of file names.
Syntax of extract() method
extract(file_name, path)
- Parameters:
- path: The path where the file needs to be extracted.
- file_name: name of the file which needs to be extracted from the tar file.
- Returns:
- None
Approach:
- Create a TarFile object of the tar file using the open() method.
- Get the names of all the files present in the tar file using the getnames() method.
- Call the extract() method on the TarFile object by passing the file name and path.
- Repeat step 3, for all the file names present in the list.
Source Code
import tarfile import os # creating the file object. my_tar = tarfile.open('./test.tar') # extract the tar file for file_name in my_tar.getnames(): my_tar.extract(file_name, './extracted_tar') # listing files in extracted folder print(os.listdir('./extracted_tar')) # closing the file object my_tar.close()
Output:
['file1.txt', 'file2.xlsx']
Extracting a tar file using _extract_member()
and getmembers()
The tarfile module consists of the _extract_member()
method, it is used to extract a single file from the tar file. It takes TarInfo
object of the file to be extracted from the tar file and the path where the file needs to be extracted as input arguments. The TarInfo
object of the files can be created using the getmembers()
method. To extract all the files present in the tar file, extract each file one by one.
Syntax of getmembers() method
getmembers()
- Parameters:
- None
- Returns:
- Return the members of the archive as a list of TarInfo objects. The list has the same order as the members in the archive.
Syntax of _extract_member() method
_extract_member(member, path)
- Parameters:
- member: The TarInfo object of the file.
- path: The path where the file needs to be extracted.
- Returns:
- None
Approach:
- Create a TarFile object of the tar file using the open() method.
- Get the TarInfo objects of all the files present in the tar file using the
getmembers()
method. - Call the
_extract_memberract()
method on the TarFile object by passing theTarInfo
object and path. - Repeat step 3, for all the files present in the tar file.
Source Code
import tarfile import os # creating the file object. my_tar = tarfile.open('./test.tar') # extract the tar file for member in my_tar.getmembers(): my_tar._extract_member(member,'./extracted_tar/'+member.name) # listing files in extracted folder print(os.listdir('./extracted_tar')) # closing the file object my_tar.close()
Output:
['file1.txt', 'file2.xlsx']
Extracting a tar file using extractfile() method
The tarfile
module consists of the extractfile()
method, which is used to extract a single file from the tar file. It takes the name of the file to be extracted from the tar file as an input argument and returns io.BufferedReader object. Write the BufferedReader returned by the extractfile() method into a file. To extract all the files present in the tar file, extract each file one by one.
Syntax of extractfile() method
extractfile(file_name)
- Parameters:
- file_name: name of the file which needs to be extracted from the tar file.
- Returns:
- None
Approach:
- Create a TarFile object of the tar file using the open() method.
- Create a new folder for extracting all the files in tar using the mkdir().
- Set the new folder path as the current working directory using the chdir().
- Get the names of all the files present in the tar file using the getnames() method.
- Call the extractfile() method on the TarFile object by passing the file name and path.
- Write the bytes returned by extractfile() into a file using the write() method.
- Repeat steps 5 and 6, for all the file names present in the list.
Source Code
import os import tarfile # creating the file object. my_tar = tarfile.open('./test.tar') os.mkdir('./extracted_tar') os.chdir('./extracted_tar') # extract the tar file for file_name in my_tar.getnames()[1:]: # As getnames() would return something like this ['.'. 'file1.txt', 'file2.xlsx'] # we are ignoring the first name in the list. file_io_bytes = my_tar.extractfile(file_name) with open(file_name, "wb") as f: f.write(file_io_bytes.read())
Output:
['file1.txt', 'file2.xlsx']
Summary
Great! you made it, We have discussed all possible methods to extract a tar file in python. Happy learning.