In this article, we will discuss how to find the smallest file in a directory and its sub-directories using python.

Table of contents

Get the smallest file in a directory using python

In python, the glob module provides a function glob() to find files/directories in a given directory based on the matching pattern. Similar to the unix path expansion rules, we can use wildcards and regular expression to match & find few or all files in a directory. We will use the glob() function, to get a list of all files in a directory and then we will look for the smallest file from the list of files. Steps are as follows,

  1. Get a list of all file & directories in a given directory using the glob().
  2. Filter the list and select files only, using the filter() and os.path.isfile() functions.
  3. Find the file with minimum size using min() function.
    • For this, use lambda x: os.stat(x).st_size as the key argument in the min() function.

Complete example to search for the smallest file in a directory is as follows,

import glob
import os

dir_name = 'C:/Program Files/Java/jdk1.8.0_191/'

# Get list of files in a directory
list_of_files = filter( os.path.isfile,
                        glob.glob(  dir_name + '*') )


# Find the smallest file from the list of files
min_file = min( list_of_files,
                key =  lambda x: os.stat(x).st_size)

print('min File: ', min_file)
print('min File size in bytes: ', os.stat(min_file).st_size)

Output:

min File:  C:/Program Files/Java/jdk1.8.0_191\LICENSE
min File size in bytes:  40

In this solution we created a list of files in a folder, then selected the file with min size. But it looked for the smallest file in the given directory only. It didn’t looked inside its sub-directories and directories inside them. What if we want to find the smallest file in the complete hierarchy of directory, even if it is inside the nth nested folder in the given directory? Let’s see how to do that

Find smallest file in a directory and its sub-directories (recursively)

In the previous example, we searched for the smallest file in a directory. But it looked into the files in the given directory only, not in nested directories. So, if you want to find the smallest in complete directory hierarchy, then checkout this example,

import glob
import os

dir_name = 'C:/Program Files/Java/jdk1.8.0_191/'

# Get list of files in a directory & sub-directories
list_of_files = filter( os.path.isfile,
                        glob.glob(  dir_name + '/**/*',
                                    recursive=True) )


# Find the smallest file from the list of files
min_file = min( list_of_files,
                key =  lambda x: os.stat(x).st_size)

print('min File: ', min_file)
print('min File size in bytes: ', os.stat(min_file).st_size)

Output:

min File:  C:/Program Files/Java/jdk1.8.0_191\jre\lib\security\trusted.libraries
min File size in bytes:  0

We used the glob() function with pattern ‘/**/*’ and recursive=True argument. It gave a list of all files and directories in the given directory and in all sub-directories using a recursive approach . Then using the filter() and os.path.isfile() functions, we filtered out the directory objects and created a list of file paths only. Then by applying the min() function on the list with the key lambda x: os.stat(x).st_size, we searched for the smallest file.

Summary:

We learned how to search for the smallest file in a directory in python.