Python: How to delete specific lines in a file in a memory-efficient way?

In this article, we will discuss different ways to delete specific lines from a file either by line number, by matching content or based on any custom logic.

In Python, there is no direct API to delete lines or text from the middle of a file. Therefore, in this article, we will follow an approach to delete lines at specific places in a file i.e.,

“Copy the contents of the given file to a temporary file line by line and while copying skip specific lines. In the end, remove the original file and rename the temporary file as the original file.”

It will give the impression that we have deleted the specific lines from the file. This file can be a simple text file or a CSV file. With this approach, we can remove lines from huge files too, in a memory-efficient way.

Let’s use this logic and implement our functions around it,

Delete a line from a file by specific line number in python

Here, we will implement a function, that accepts a file name and a line number as arguments. Then it deletes the line at that specified line number,

Algo of the function will be,

  • Accept original file name and line number as an argument
  • Open original file in read mode
  • Create a temporary file and open that in write mode
  • Read contents from an original file line by line and for each line,
    • Keep track of line number
    • If line number matches the line number in function argument, then skip that line, else add the line in the temporary file
  • If any line is skipped while copying then,
    • Delete the original file
    • Rename the temporary file as original file
  • Else
    • Delete the temporary file

Implementation of the function is as follows,

def delete_line(original_file, line_number):
    """ Delete a line from a file at the given line number """
    is_skipped = False
    current_index = 0
    dummy_file = original_file + '.bak'
    # Open original file in read only mode and dummy file in write mode
    with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
        # Line by line copy data from original file to dummy file
        for line in read_obj:
            # If current line number matches the given line number then skip copying
            if current_index != line_number:
                write_obj.write(line)
            else:
                is_skipped = True
            current_index += 1

    # If any line is skipped then rename dummy file as original file
    if is_skipped:
        os.remove(original_file)
        os.rename(dummy_file, original_file)
    else:
        os.remove(dummy_file)

This function assumes line numbers starts from 0 in the file. So, to delete the line at nth position, we need to pass n-1 as line number. Now let’s use this function,

Suppose we have a file ‘sample_1.txt’ with following contents,

Hello this is a sample file
It contains sample text
Dummy Line A
Dummy Line B
Dummy Line C
This is the end of file

Now let’s delete the line at line number 2 using above created function,

delete_line('sample_1.txt', 1)

Now the content of the file ‘sample_1.txt’ is as follows,

Hello this is a sample file
Dummy Line A
Dummy Line B
Dummy Line C
This is the end of file

2nd line from the file is deleted now.

Delete multiple lines from a file by line numbers

To delete various lines from a file by line numbers we will use similar kind of algo i.e.

  • Accept original filename and list of line numbers as argument
  • Open original file in read mode
  • Create a dummy / temporary file and open that in write mode
  • Read contents from an original file line by line and for each line,
    • Keep track of line number
    • If the line number of the current line matches the line number in the given list of numbers, then skip that line, else add the line in the temporary / dummy file.
  • If any line is skipped while copying then,
    • Delete the original file
    • Rename the temporary file as the original file
  • Else
    • Delete the temporary file

Implementation of the function is as follows,

def delete_multiple_lines(original_file, line_numbers):
    """In a file, delete the lines at line number in given list"""
    is_skipped = False
    counter = 0
    # Create name of dummy / temporary file
    dummy_file = original_file + '.bak'
    # Open original file in read only mode and dummy file in write mode
    with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
        # Line by line copy data from original file to dummy file
        for line in read_obj:
            # If current line number exist in list then skip copying that line
            if counter not in line_numbers:
                write_obj.write(line)
            else:
                is_skipped = True
            counter += 1

    # If any line is skipped then rename dummy file as original file
    if is_skipped:
        os.remove(original_file)
        os.rename(dummy_file, original_file)
    else:
        os.remove(dummy_file)

Suppose we have a file ‘sample_2.txt’ with following contents,

Hello this is a sample file
It contains sample text
Dummy Line A
Dummy Line B
Dummy Line C
This is the end of file

Let’s use the above function to delete lines at line number 1,2,3 from the text file,

delete_multiple_lines('sample_2.txt', [0,1,2])

Now, the contents of file ‘sample_2.txt’ is as follows,

Dummy Line B
Dummy Line C
This is the end of file

It removed multiple lines from the file. As this function expects that line number starts from 0, so to delete lines at line number 1,2,3 we passed the 0,1, & 2 as line numbers in the list.

Delete a specific line from the file by matching content

Suppose instead of line number we want to delete a specific line from a text / CSV file that completely matches with the given text. To do that we are going to use the same logic i.e.

“Copy the contents of the given file to a temporary file line by line and while copying, for each line check if it matches with the given text. If it matches, then skip that line while copying. In the end, remove the original file and rename the temporary file as the original file.”

Function to remove a line from a file that matches the given text is as follows,

def delete_line_by_full_match(original_file, line_to_delete):
    """ In a file, delete the lines at line number in given list"""
    is_skipped = False
    dummy_file = original_file + '.bak'
    # Open original file in read only mode and dummy file in write mode
    with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
        # Line by line copy data from original file to dummy file
        for line in read_obj:
            line_to_match = line
            if line[-1] == '\n':
                line_to_match = line[:-1]
            # if current line matches with the given line then skip that line
            if line_to_match != line_to_delete:
                write_obj.write(line)
            else:
                is_skipped = True

    # If any line is skipped then rename dummy file as original file
    if is_skipped:
        os.remove(original_file)
        os.rename(dummy_file, original_file)
    else:
        os.remove(dummy_file)

Suppose we have a file ‘sample_3.txt’ with following contents,

Hello this is a sample file
It contains sample text
Dummy Line A
Dummy Line B
Dummy Line C
This is the end of file

Let’s use the above function to delete line with content “Dummy Line B”

delete_line_by_full_match('sample_3.txt', 'Dummy Line B')

Now, the contents of file ‘sample.txt’ is as follows,

Hello this is a sample file
It contains sample text
Dummy Line A
Dummy Line C
This is the end of file

Delete specific lines from a file that matches the given conditions

In all the above examples, we followed the same logic to delete lines from a file. The only different thing was the logic to identify lines that we need to skip. We can move that logic outside and make this function generic.

Algo of the generic function will be,

  • Accept original file name and a function as call-back i.e. condition()
  • Open original file in read mode
  • Create a temporary file and open that in write mode
  • Read contents from the original file line by line and for each line,
    • Keep track of line number
    • Pass the line in call-back function i.e. condition() and if that function returns True then skip that line while copying, else copy the line in temporary file
  • If any line is skipped while copying then,
    • Delete the original file
    • Rename the temporary file as original file
  • Else
    • Delete the temporary file

Implementation of the function is as follows,

def delete_line_by_condition(original_file, condition):
    """ In a file, delete the lines at line number in given list"""

    dummy_file = original_file + '.bak'
    is_skipped = False
    # Open original file in read only mode and dummy file in write mode
    with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
        # Line by line copy data from original file to dummy file
        for line in read_obj:
            # if current line matches the given condition then skip that line
            if condition(line) == False:
                write_obj.write(line)
            else:
                is_skipped = True

    # If any line is skipped then rename dummy file as original file
    if is_skipped:
        os.remove(original_file)
        os.rename(dummy_file, original_file)
    else:
        os.remove(dummy_file)

We can use this function to delete specific lines from a file. Now, logic to identify specific lines which are needed to be deleted can be written in a separate function and we can pass this function as argument in the delete_line_by_condition() function.

Let’s see some example to delete lines with custom logic using above created function,

Delete lines from a file that contains a word / sub-string

Contents of file ‘sample_4.txt’ are as follows,

Hello this is a sample file
It contains sample text
Dummy Line A
Dummy Line B
Dummy Line C
This is the end of file

Let’s use the above function to delete line that contains a string “Dummy”

delete_line_with_word('sample_4.txt', 'Dummy')

Now, the contents of file ‘sample_4.txt’ is as follows,

Hello this is a sample file
It contains sample text
This is the end of file

We passed the logic as lambda function. For each line in the file this lambda function was invoked and lines for which this lambda function returned True, were deleted.

Delete shorter lines from a file i.e. lines with length less than the minimum length

Contents of file ‘sample_5.txt’ are as follows,

Hello this is a sample file
It contains sample text
Dummy Line A
Dummy Line B
Dummy Line C
This is the end of file

Let’s use the above function to delete lines whose length is less than 15

delete_shorter_lines('sample_5.txt', 15)

Now, the contents of file ‘sample_5.txt’ is as follows,

Hello this is a sample file
It contains sample text
This is the end of file

We passed the logic as lambda function. For each line in the file, this lambda function was invoked, and lines for which this lambda function returned True were deleted.

The complete example is,

import os


def delete_line(original_file, line_number):
    """ Delete a line from a file at the given line number """
    is_skipped = False
    current_index = 0
    dummy_file = original_file + '.bak'
    # Open original file in read only mode and dummy file in write mode
    with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
        # Line by line copy data from original file to dummy file
        for line in read_obj:
            # If current line number matches the given line number then skip copying
            if current_index != line_number:
                write_obj.write(line)
            else:
                is_skipped = True
            current_index += 1

    # If any line is skipped then rename dummy file as original file
    if is_skipped:
        os.remove(original_file)
        os.rename(dummy_file, original_file)
    else:
        os.remove(dummy_file)


def delete_multiple_lines(original_file, line_numbers):
    """In a file, delete the lines at line number in given list"""
    is_skipped = False
    counter = 0
    # Create name of dummy / temporary file
    dummy_file = original_file + '.bak'
    # Open original file in read only mode and dummy file in write mode
    with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
        # Line by line copy data from original file to dummy file
        for line in read_obj:
            # If current line number exist in list then skip copying that line
            if counter not in line_numbers:
                write_obj.write(line)
            else:
                is_skipped = True
            counter += 1

    # If any line is skipped then rename dummy file as original file
    if is_skipped:
        os.remove(original_file)
        os.rename(dummy_file, original_file)
    else:
        os.remove(dummy_file)


def delete_line_by_full_match(original_file, line_to_delete):
    """ In a file, delete the lines at line number in given list"""
    is_skipped = False
    dummy_file = original_file + '.bak'
    # Open original file in read only mode and dummy file in write mode
    with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
        # Line by line copy data from original file to dummy file
        for line in read_obj:
            line_to_match = line
            if line[-1] == '\n':
                line_to_match = line[:-1]
            # if current line matches with the given line then skip that line
            if line_to_match != line_to_delete:
                write_obj.write(line)
            else:
                is_skipped = True

    # If any line is skipped then rename dummy file as original file
    if is_skipped:
        os.remove(original_file)
        os.rename(dummy_file, original_file)
    else:
        os.remove(dummy_file)


def delete_line_by_condition(original_file, condition):
    """ In a file, delete the lines at line number in given list"""

    dummy_file = original_file + '.bak'
    is_skipped = False
    # Open original file in read only mode and dummy file in write mode
    with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
        # Line by line copy data from original file to dummy file
        for line in read_obj:
            # if current line matches the given condition then skip that line
            if condition(line) == False:
                write_obj.write(line)
            else:
                is_skipped = True

    # If any line is skipped then rename dummy file as original file
    if is_skipped:
        os.remove(original_file)
        os.rename(dummy_file, original_file)
    else:
        os.remove(dummy_file)


def delete_line_with_word(file_name, word):
    """Delete lines from a file that contains a given word / sub-string """
    delete_line_by_condition(file_name, lambda x : word in x )


def delete_shorter_lines(file_name, min_length):
    """Delete lines from a file that which are shorter than min_length """
    delete_line_by_condition(file_name, lambda x: len(x) < min_length)

def main():
    delete_line('sample_1.txt', 1)

    delete_multiple_lines('sample_2.txt', [0,1,2])

    delete_line_by_full_match('sample_3.txt', 'Dummy Line B')

    delete_line_with_word('sample_4.txt', 'Dummy')

    delete_shorter_lines('sample_5.txt', 15)

if __name__ == '__main__':
   main()

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top