Pandas: Drop first N rows of dataframe

In this article, we will discuss different ways to delete first N rows of a dataframe in python.

Use iloc to drop first N rows of pandas dataframe

In Pandas, the Dataframe provides an attribute iloc to select a portion of the dataframe using position based indexing. This selected portion can be a few columns or rows . We can use this attribute to select all the rows except first N rows of a dataframe and then assign back that to the original variable. It will give an effect that we have deleted the first N rows from the dataframe. For example,

# Drop first 3 rows
# by selecting all rows from 4th row onwards
N = 3
df = df.iloc[N: , :]

We selected a portion of dataframe, that included all columns, but it selected only last (size – N) rows. Then assigned this back to the same variable. So, basically it removed the first N rows of dataframe.

How did it work?

The syntax of dataframe.iloc[] is like,

df.iloc[row_start:row_end , col_start, col_end]
  • row_start: The row index/position from where it should start selection. Default is 0.
  • row_end: The row index/position from where it should end the selection i.e. select till row_end-1. Default is till the last row of the dataframe.
  • col_start: The column index/position from where it should start selection. Default is 0.
  • col_end: The column index/position from where it should end the selection i.e. select till col_end-1. Default is till the last column of the dataframe.

It returns a portion of dataframe that includes rows from row_start to row_end-1 and columns from col_start to col_end-1.

To delete the first N rows of the dataframe, just select the rows from row number N till the end and select all columns. As indexing starts from 0, so to select all rows after the N, use –> (N:) i.e. from Nth row till the end. To select all the columns use default values i.e. (:) i.e.

df = df.iloc[N: , :]

Checkout complete example to delete the first 3 rows of dataframe,

import pandas as pd

# List of Tuples
empoyees = [('Jack',    34, 'Sydney',   5),
            ('Riti',    31, 'Delhi' ,   7),
            ('Aadi',    16, 'London',   11),
            ('Mark',    41, 'Delhi' ,   12),
            ('Sam',     56, 'London',   33)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees, 
                    columns=['Name', 'Age', 'City', 'Experience'],
                    index = ['A', 'B', 'C', 'D', 'E'])

print("Contents of the Dataframe : ")
print(df)

# Drop first 3 rows
# by selecting all rows from 4th row onwards
N = 3
df = df.iloc[N: , :]

print("Modified Dataframe : ")
print(df)

Output:

Contents of the Dataframe :
   Name  Age    City  Experience
A  Jack   34  Sydney           5
B  Riti   31   Delhi           7
C  Aadi   16  London          11
D  Mark   41   Delhi          12
E   Sam   56  London          33
Modified Dataframe :
   Name  Age    City  Experience
D  Mark   41   Delhi          12
E   Sam   56  London          33

Use drop() to remove first N rows of pandas dataframe

In pandas, the dataframe’s drop() function accepts a sequence of row names that it needs to delete from the dataframe. To make sure that it removes the rows only, use argument axis=0 and to make changes in place i.e. in calling dataframe object, pass argument inplace=True.

Checkout complete example to delete the first 3 rows of dataframe,

import pandas as pd

# List of Tuples
empoyees = [('Jack',    34, 'Sydney',   5),
            ('Riti',    31, 'Delhi' ,   7),
            ('Aadi',    16, 'London',   11),
            ('Mark',    41, 'Delhi' ,   12),
            ('Sam',     56, 'London',   33)]


# Create a DataFrame object
df = pd.DataFrame(  empoyees, 
                    columns=['Name', 'Age', 'City', 'Experience'],
                    index = ['A', 'B', 'C', 'D', 'E'])

print("Contents of the Dataframe : ")
print(df)

# Drop first 3 rows of dataframe
N = 3
df.drop(index=df.index[:N], 
        axis=0, 
        inplace=True)

print("Modified Dataframe : ")
print(df)

Output:

Contents of the Dataframe :
   Name  Age    City  Experience
A  Jack   34  Sydney           5
B  Riti   31   Delhi           7
C  Aadi   16  London          11
D  Mark   41   Delhi          12
E   Sam   56  London          33
Modified Dataframe :
   Name  Age    City  Experience
D  Mark   41   Delhi          12
E   Sam   56  London          33

We fetched the row names of dataframe as a sequence and passed the first N row names ( df.index[:N] ) as the index argument in drop() function, therefore it deleted the first N rows (3 rows) of dataframe.

Use tail() to remove first N rows of pandas dataframe

In Pandas, dataframe provides a function tail(N) to select last N rows of dataframe. To delete first N rows of dataframe, we can select last (Size-N) rows of dataframe using tail function. For example,

import pandas as pd

# List of Tuples
empoyees = [('Jack',    34, 'Sydney',   5),
            ('Riti',    31, 'Delhi' ,   7),
            ('Aadi',    16, 'London',   11),
            ('Mark',    41, 'Delhi' ,   12),
            ('Sam',     56, 'London',   33)]

# Create a DataFrame object
df = pd.DataFrame(  empoyees, 
                    columns=['Name', 'Age', 'City', 'Experience'],
                    index = ['A', 'B', 'C', 'D', 'E'])

print("Contents of the Dataframe : ")
print(df)

# Drop first 3 rows of dataframe
N = 3
df = df.tail(df.shape[0] -N)

print("Modified Dataframe : ")
print(df)

Output:

Contents of the Dataframe :
   Name  Age    City  Experience
A  Jack   34  Sydney           5
B  Riti   31   Delhi           7
C  Aadi   16  London          11
D  Mark   41   Delhi          12
E   Sam   56  London          33
Modified Dataframe :
   Name  Age    City  Experience
D  Mark   41   Delhi          12
E   Sam   56  London          33

It removed the first 3 rows of dataframe in place.

Summary:

We learned about four different ways to delete first N rows of a dataframe.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top