Python- Find the largest file in a directory

In this article, we will discuss how to find the largest file in a directory and its sub-directories using python.

Table of contents

Get the largest file in a directory using python

In python, the glob module provides a function glob() to find files/directories in a given directory based on the matching pattern. Similar to the unix path expansion rules, we can use wildcards and regular expression to match & find few or all files in a directory. We will use the glob() function, to get a list of all files in a directory and then we will look for the largest file from the list of files. Steps are as follows,

  1. Get a list of all file & directories in a given directory using the glob().
  2. Filter the list and select files only, using the filter() and os.path.isfile() functions.
  3. Find the file with maximum size using max() function.
    • For this, use lambda x: os.stat(x).st_size as the key argument in the max() function.

Complete example to search for the largest file in a directory is as follows,

import glob
import os

dir_name = 'C:/Program Files/Java/jdk1.8.0_191/'

# Get list of files in a directory
list_of_files = filter( os.path.isfile,
                        glob.glob(  dir_name + '*') )


# Find the file with max size from the list of files
max_file = max( list_of_files,
                key =  lambda x: os.stat(x).st_size)

print('Max File: ', max_file)
print('Max File size in bytes: ', os.stat(max_file).st_size)

Output:

Advertisements
Max File:  C:/Program Files/Java/jdk1.8.0_191\src.zip
Max File size in bytes:  21245025

In this solution we created a list of files in a folder, then selected the file with max size. But it looked for the largest file in the given directory only. It didn’t looked inside its sub-directories and directories inside them. What if we want to find the largest file in the complete hierarchy of directory, even if it is inside the nth nested folder in the given directory? Let’s see how to do that

Find largest file in a directory and its sub-directories (recursively)

In the previous example we searched for the largest file in a directory. But it looked into the files in the given directory only, not in nested directories. So, if you want to find the largest in complete directory hierarchy, then checkout this example,

import glob
import os

dir_name = 'C:/Program Files/Java/jdk1.8.0_191/'

# Get list of files in a directory & sub-directories
list_of_files = filter( os.path.isfile,
                        glob.glob(  dir_name + '/**/*',
                                    recursive=True) )


# Find the file with max size from the list of files
max_file = max( list_of_files,
                key =  lambda x: os.stat(x).st_size)

print('Max File: ', max_file)
print('Max File size in bytes: ', os.stat(max_file).st_size)

Output:

Max File:  C:/Program Files/Java/jdk1.8.0_191\jre\lib\rt.jar
Max File size in bytes:  63596151

We used the glob() function with pattern ‘/**/*’ and recursive=True argument. It gave a list of all files and directories in the given directory and in all sub-directories using a recursive approach . Then using the filter() and os.path.isfile() functions, we filtered out the directory objects and created a list of file paths only. Then by applying the max() function on the list with the key lambda x: os.stat(x).st_size, we searched for the largest file.

Summary:

We learned how to search for the largest file in a directory in python.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top