Python : How to remove files by matching pattern | wildcards | certain extensions only ?

In this article we will discuss how to delete few files from a directory based on matching pattern or wildcard.

Suppose we have a directory that contains some log files and some text files and we want to delete all .txt files from that directory i.e. files that ends with string “.txt”.
Let’s discuss how to do that using different techniques,

Remove files by pattern using glob.glob() & os.remove()

To remove files by matching pattern, we need to get list of all files paths that matches the specified pattern using glob.glob() and then delete them one by one using os.remove() i.e.

import os
import glob

# Get a list of all the file paths that ends with .txt from in specified directory
fileList = glob.glob('/home/varung/Documents/python/logs/*.log')

# Iterate over the list of filepaths & remove each file.
for filePath in fileList:
    try:
        os.remove(filePath)
    except:
        print("Error while deleting file : ", filePath)

It will remove all ‘.txt’ files in directory /home/varung/Documents/python/logs/ but it will not remove files in it’s sub directories.
Let’s understand how it works,

Get list of files using glob.glob()

glob.glob(pathname, *, recursive=False)

glob.glob() accepts path name as shell pattern and finds the path of all the files that matches the specified pattern. As by default recursive parameter is False, therefore it will find files of matching pattern in given directory only not the sub directories.

Then Iterate over the file path list and delete each file using os.remove(), also catch any exception that can happen due to file permissions.

As we have seen by this approach we can not recursively delete files from sub directories. For that we need another solution,

Recursively Remove files by matching pattern or wildcard

In glob.glob() To recursively find the files that matches the given pattern we need to pass recursive parameter as True & also use “**” in matching pattern i.e.

fileList = glob.glob('/home/varung/Documents/python/logs/**/*.txt', recursive=True)

It Will recursively search all the ‘txt’ files including files in sub directories. Then we can iterate over the list and delete each file on by one using os.remove() i.e.

import os
import glob

# get a recursive list of file paths that matches pattern including sub directories
fileList = glob.glob('/home/varung/Documents/python/logs/**/*.txt', recursive=True)

# Iterate over the list of filepaths & remove each file.
for filePath in fileList:
    try:
        os.remove(filePath)
    except OSError:
        print("Error while deleting file")

It will delete all the txt files from /home/varung/Documents/python/logs/ and it’s sub directories.

Recursively Remove files by matching pattern or wildcard using os.walk()

os.walk() generates filename in given directory by walking over the tree structure in top down or bottom up approach i.e.

os.walk(top, topdown=True, onerror=None, followlinks=False)

For each directory and it’s sub directory it yields a tuple (rootDir, subdirs, filenames) i.e.

  • rootDir
    • path of the directory it’s iterating
  • subdirs
    • List of all the sub directories inside this root dir.
  • filenames
    • List of all names of files in root dir

It will iterate over all the sub directories in specified directory and in each iteration sub directory will become root dir.

Let’s use this os.walk() to get a list of all files in given directory that matches pattern. Then delete those files i.e.

import os
import fnmatch

# Get a list of all files in directory
for rootDir, subdirs, filenames in os.walk('/home/varung/Documents/python/logs/'):
    # Find the files that matches the given patterm
    for filename in fnmatch.filter(filenames, '*.txt'):
        try:
            os.remove(os.path.join(rootDir, filename))
        except OSError:
            print("Error while deleting file")

It will delete all the ‘*.txt’ files from directory /home/varung/Documents/python/logs and also from it’s sub directories.

Let’s create a Generic function to delete all the files from a given directory based on matching pattern and it will also return the files names that were not deleted due to some error.

import os
import fnmatch


'''
Generic function to delete all the files from a given directory based on matching pattern
'''
def removeFilesByMatchingPattern(dirPath, pattern):
    listOfFilesWithError = []
    for parentDir, dirnames, filenames in os.walk(dirPath):
        for filename in fnmatch.filter(filenames, pattern):
            try:
                os.remove(os.path.join(parentDir, filename))
            except:
                print("Error while deleting file : ", os.path.join(parentDir, filename))
                listOfFilesWithError.append(os.path.join(parentDir, filename))

    return listOfFilesWithError

Let’s call this function to delete files based on matching pattern i.e.

listOfErrors = removeFilesByMatchingPattern('/home/varung/Documents/python/logs/', '*.txt')

print('Files that can not be deleted : ')
for filePath in listOfErrors:
    print(filePath)

Complete example is as follows,

import os
import glob
import fnmatch


'''
Generic function to delete all the files from a given directory based on matching pattern
'''
def removeFilesByMatchingPattern(dirPath, pattern):
    listOfFilesWithError = []
    for parentDir, dirnames, filenames in os.walk(dirPath):
        for filename in fnmatch.filter(filenames, pattern):
            try:
                os.remove(os.path.join(parentDir, filename))
            except:
                print("Error while deleting file : ", os.path.join(parentDir, filename))
                listOfFilesWithError.append(os.path.join(parentDir, filename))

    return listOfFilesWithError


def main():

    print('***** Remove files by pattern using glob.glob() & os.remove() *****')

    # Get a list of all the file paths that ends with .txt from in specified directory
    fileList = glob.glob('/home/varung/Documents/python/logs/*.log')

    # Iterate over the list of filepaths & remove each file.
    for filePath in fileList:
        try:
            os.remove(filePath)
        except:
            print("Error while deleting file : ", filePath)

    print("Recursively Remove files by matching pattern or wildcard using glob.glob() & os.remove()")

    # get a recursive list of file paths that matches pattern including sub directories
    fileList = glob.glob('/home/varung/Documents/python/logs/**/*.txt', recursive=True)

    # Iterate over the list of filepaths & remove each file.
    for filePath in fileList:
        try:
            os.remove(filePath)
        except OSError:
            print("Error while deleting file")

    print("Recursively Remove files by matching pattern or wildcard using os.walk()")

    # Get a list of all files in directory
    for rootDir, subdirs, filenames in os.walk('/home/varung/Documents/python/logs/'):
        # Find the files that matches the given patterm
        for filename in fnmatch.filter(filenames, '*.txt'):
            try:
                os.remove(os.path.join(rootDir, filename))
            except OSError:
                print("Error while deleting file")


    print('remove files based on matching pattern and get a list of errors')

    listOfErrors = removeFilesByMatchingPattern('/home/varung/Documents/python/logs/', '*.txt')

    print('Files that can not be deleted : ')
    for filePath in listOfErrors:
        print(filePath)

if __name__ == '__main__':
    main()

 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top