In this article, we will discuss different ways to get list of all files in a directory or folder along with size in python.
Table of contents
- Get list of files (file paths) in directory with size.
- Get list of file names in directory with size.
- Get list of files (file paths) in directory and sub-directories with size.
Get list of all files in directory with size using glob()
In python, the glob module provides a function glob() to find files or directories in a given directory based on the matching pattern. Similar to unix path expansion rules, we can use wildcards and regular expression to match & find few or all files in a directory using the globe() function. We will use this to get a list of all files in a directory along with the size. Steps are as follows,
- Get a list of all files and directories in a given directory using glob() function.
- Using the filter() function and os.path.isfileIO(), select files only from the list.
- For each file in the list, calculate its size and create a list of tuples i.e. list of file paths and size.
Complete example to get list of files in directory with size is as follows,
import glob import os dir_name = 'C:/Program Files/Java/jdk-15.0.1/include/' # Get a list of files (file paths) in the given directory list_of_files = filter( os.path.isfile, glob.glob(dir_name + '*') ) # get list of ffiles with size files_with_size = [ (file_path, os.stat(file_path).st_size) for file_path in list_of_files ] # Iterate over list of tuples i.e. file_paths with size # and print them one by one for file_path, file_size in files_with_size: print(file_size, ' -->', file_path)
Output:
21158 --> C:/Program Files/Java/jdk-15.0.1/include\classfile_constants.h 11461 --> C:/Program Files/Java/jdk-15.0.1/include\jawt.h 7154 --> C:/Program Files/Java/jdk-15.0.1/include\jdwpTransport.h 74681 --> C:/Program Files/Java/jdk-15.0.1/include\jni.h 83360 --> C:/Program Files/Java/jdk-15.0.1/include\jvmti.h 3774 --> C:/Program Files/Java/jdk-15.0.1/include\jvmticmlr.h
The os.stat(file_path) function returns an object that contains the file statistics. We can fetch the st_size attribute of the stat object i.e. the size of file in bytes.
In the above solution we created a list of files in a folder and then for each file we fetched the file size in bytes using os.stat()function and then created a list of tuple i.e. file_path & file size. But the list contains the name of files along with the size in bytes.
Get list of files names in directory with size using os.listdir()
In Python, the os module provides a function listdir(dir_path), which returns a list of file & directory names in the given directory path. Using the filter() function and os.path.isfileIO(), select files only from the list. Then we can iterate over this list of file names and fetch the size of each file. Then we can create a list of tuples i.e. file name and size.
Complete example to get list of file names in directory with size is as follows,
import os dir_name = 'C:/Program Files/Java/jdk-15.0.1/include/' # Get list of all files only in the given directory list_of_files = filter( lambda x: os.path.isfile(os.path.join(dir_name, x)), os.listdir(dir_name) ) # Create a list of files in directory along with the size files_with_size = [ (file_name, os.stat(os.path.join(dir_name, file_name)).st_size) for file_name in list_of_files ] # Iterate over list of files along with size # and print them one by one. for file_name, size in files_with_size: print(size, ' -->', file_name)
Output:
21158 --> classfile_constants.h 11461 --> jawt.h 7154 --> jdwpTransport.h 74681 --> jni.h 83360 --> jvmti.h 3774 --> jvmticmlr.h
In this solution we created a list of file names in a folder along with the size in bytes.
Python: Get list of files in directory and sub-directories with size
In both the previous examples we created a list of files in a directory with size. But it covered the files in the given directory only, not in nested directories. So, if you want to get a list of files in directory and sub-directory with the size then checkout this example,
import glob import os dir_name = 'C:/Program Files/Java/jdk-15.0.1/include' # Get a list of files (file paths) in the given directory list_of_files = filter( os.path.isfile, glob.glob(dir_name + '/**/*', recursive=True) ) # get list of ffiles with size files_with_size = [ (file_path, os.stat(file_path).st_size) for file_path in list_of_files ] # Iterate over list of tuples i.e. file_paths with size # and print them one by one for file_path, file_size in files_with_size: print(file_size, ' -->', file_path)
Output:
21158 --> C:/Program Files/Java/jdk-15.0.1/include\classfile_constants.h 11461 --> C:/Program Files/Java/jdk-15.0.1/include\jawt.h 7154 --> C:/Program Files/Java/jdk-15.0.1/include\jdwpTransport.h 74681 --> C:/Program Files/Java/jdk-15.0.1/include\jni.h 83360 --> C:/Program Files/Java/jdk-15.0.1/include\jvmti.h 3774 --> C:/Program Files/Java/jdk-15.0.1/include\jvmticmlr.h 898 --> C:/Program Files/Java/jdk-15.0.1/include\win32\jawt_md.h 583 --> C:/Program Files/Java/jdk-15.0.1/include\win32\jni_md.h 4521 --> C:/Program Files/Java/jdk-15.0.1/include\win32\bridge\AccessBridgeCallbacks.h 35096 --> C:/Program Files/Java/jdk-15.0.1/include\win32\bridge\AccessBridgeCalls.h 76585 --> C:/Program Files/Java/jdk-15.0.1/include\win32\bridge\AccessBridgePackages.h
We used the glob() function with pattern ‘/**/*’ and recursive argument with value True. It gave a list of all files in given directory and all sub-directories recursively. Then using the os.stat(file_path).st_size function, we calculated the size of each file and created a list of files along with the size.
Summary:
We learned about different ways to get a list of files in a folder with the size.
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.