In this article, we will discuss how to find the largest file in a directory and its sub-directories using python.

Table of contents

Get the largest file in a directory using python

In python, the glob module provides a function glob() to find files/directories in a given directory based on the matching pattern. Similar to the unix path expansion rules, we can use wildcards and regular expression to match & find few or all files in a directory. We will use the glob() function, to get a list of all files in a directory and then we will look for the largest file from the list of files. Steps are as follows,

  1. Get a list of all file & directories in a given directory using the glob().
  2. Filter the list and select files only, using the filter() and os.path.isfile() functions.
  3. Find the file with maximum size using max() function.
    • For this, use lambda x: os.stat(x).st_size as the key argument in the max() function.

Complete example to search for the largest file in a directory is as follows,

import glob
import os

dir_name = 'C:/Program Files/Java/jdk1.8.0_191/'

# Get list of files in a directory
list_of_files = filter( os.path.isfile,
                        glob.glob(  dir_name + '*') )


# Find the file with max size from the list of files
max_file = max( list_of_files,
                key =  lambda x: os.stat(x).st_size)

print('Max File: ', max_file)
print('Max File size in bytes: ', os.stat(max_file).st_size)

Output:

Max File:  C:/Program Files/Java/jdk1.8.0_191\src.zip
Max File size in bytes:  21245025

In this solution we created a list of files in a folder, then selected the file with max size. But it looked for the largest file in the given directory only. It didn’t looked inside its sub-directories and directories inside them. What if we want to find the largest file in the complete hierarchy of directory, even if it is inside the nth nested folder in the given directory? Let’s see how to do that

Find largest file in a directory and its sub-directories (recursively)

In the previous example we searched for the largest file in a directory. But it looked into the files in the given directory only, not in nested directories. So, if you want to find the largest in complete directory hierarchy, then checkout this example,

import glob
import os

dir_name = 'C:/Program Files/Java/jdk1.8.0_191/'

# Get list of files in a directory & sub-directories
list_of_files = filter( os.path.isfile,
                        glob.glob(  dir_name + '/**/*',
                                    recursive=True) )


# Find the file with max size from the list of files
max_file = max( list_of_files,
                key =  lambda x: os.stat(x).st_size)

print('Max File: ', max_file)
print('Max File size in bytes: ', os.stat(max_file).st_size)

Output:

Max File:  C:/Program Files/Java/jdk1.8.0_191\jre\lib\rt.jar
Max File size in bytes:  63596151

We used the glob() function with pattern ‘/**/*’ and recursive=True argument. It gave a list of all files and directories in the given directory and in all sub-directories using a recursive approach . Then using the filter() and os.path.isfile() functions, we filtered out the directory objects and created a list of file paths only. Then by applying the max() function on the list with the key lambda x: os.stat(x).st_size, we searched for the largest file.

Summary:

We learned how to search for the largest file in a directory in python.