In this article, we will discuss different ways to delete first N rows of a dataframe in python.
Use iloc to drop first N rows of pandas dataframe
In Pandas, the Dataframe provides an attribute iloc to select a portion of the dataframe using position based indexing. This selected portion can be a few columns or rows . We can use this attribute to select all the rows except first N rows of a dataframe and then assign back that to the original variable. It will give an effect that we have deleted the first N rows from the dataframe. For example,
# Drop first 3 rows # by selecting all rows from 4th row onwards N = 3 df = df.iloc[N: , :]
We selected a portion of dataframe, that included all columns, but it selected only last (size – N) rows. Then assigned this back to the same variable. So, basically it removed the first N rows of dataframe.
How did it work?
The syntax of dataframe.iloc[] is like,
df.iloc[row_start:row_end , col_start, col_end]
- row_start: The row index/position from where it should start selection. Default is 0.
- row_end: The row index/position from where it should end the selection i.e. select till row_end-1. Default is till the last row of the dataframe.
- col_start: The column index/position from where it should start selection. Default is 0.
- col_end: The column index/position from where it should end the selection i.e. select till col_end-1. Default is till the last column of the dataframe.
It returns a portion of dataframe that includes rows from row_start to row_end-1 and columns from col_start to col_end-1.
To delete the first N rows of the dataframe, just select the rows from row number N till the end and select all columns. As indexing starts from 0, so to select all rows after the N, use –> (N:) i.e. from Nth row till the end. To select all the columns use default values i.e. (:) i.e.
Frequently Asked:
- Get Last value of a Column in Pandas DataFrame
- Pandas Tutorial #12 – Handling Missing Data
- Pandas: Drop last N columns of dataframe
- Write a Pandas DataFrame to CSV file
df = df.iloc[N: , :]
Checkout complete example to delete the first 3 rows of dataframe,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5), ('Riti', 31, 'Delhi' , 7), ('Aadi', 16, 'London', 11), ('Mark', 41, 'Delhi' , 12), ('Sam', 56, 'London', 33)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience'], index = ['A', 'B', 'C', 'D', 'E']) print("Contents of the Dataframe : ") print(df) # Drop first 3 rows # by selecting all rows from 4th row onwards N = 3 df = df.iloc[N: , :] print("Modified Dataframe : ") print(df)
Output:
Contents of the Dataframe : Name Age City Experience A Jack 34 Sydney 5 B Riti 31 Delhi 7 C Aadi 16 London 11 D Mark 41 Delhi 12 E Sam 56 London 33 Modified Dataframe : Name Age City Experience D Mark 41 Delhi 12 E Sam 56 London 33
Use drop() to remove first N rows of pandas dataframe
In pandas, the dataframe’s drop() function accepts a sequence of row names that it needs to delete from the dataframe. To make sure that it removes the rows only, use argument axis=0 and to make changes in place i.e. in calling dataframe object, pass argument inplace=True.
Checkout complete example to delete the first 3 rows of dataframe,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5), ('Riti', 31, 'Delhi' , 7), ('Aadi', 16, 'London', 11), ('Mark', 41, 'Delhi' , 12), ('Sam', 56, 'London', 33)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience'], index = ['A', 'B', 'C', 'D', 'E']) print("Contents of the Dataframe : ") print(df) # Drop first 3 rows of dataframe N = 3 df.drop(index=df.index[:N], axis=0, inplace=True) print("Modified Dataframe : ") print(df)
Output:
Contents of the Dataframe : Name Age City Experience A Jack 34 Sydney 5 B Riti 31 Delhi 7 C Aadi 16 London 11 D Mark 41 Delhi 12 E Sam 56 London 33 Modified Dataframe : Name Age City Experience D Mark 41 Delhi 12 E Sam 56 London 33
We fetched the row names of dataframe as a sequence and passed the first N row names ( df.index[:N] ) as the index argument in drop() function, therefore it deleted the first N rows (3 rows) of dataframe.
Use tail() to remove first N rows of pandas dataframe
In Pandas, dataframe provides a function tail(N) to select last N rows of dataframe. To delete first N rows of dataframe, we can select last (Size-N) rows of dataframe using tail function. For example,
import pandas as pd # List of Tuples empoyees = [('Jack', 34, 'Sydney', 5), ('Riti', 31, 'Delhi' , 7), ('Aadi', 16, 'London', 11), ('Mark', 41, 'Delhi' , 12), ('Sam', 56, 'London', 33)] # Create a DataFrame object df = pd.DataFrame( empoyees, columns=['Name', 'Age', 'City', 'Experience'], index = ['A', 'B', 'C', 'D', 'E']) print("Contents of the Dataframe : ") print(df) # Drop first 3 rows of dataframe N = 3 df = df.tail(df.shape[0] -N) print("Modified Dataframe : ") print(df)
Output:
Contents of the Dataframe : Name Age City Experience A Jack 34 Sydney 5 B Riti 31 Delhi 7 C Aadi 16 London 11 D Mark 41 Delhi 12 E Sam 56 London 33 Modified Dataframe : Name Age City Experience D Mark 41 Delhi 12 E Sam 56 London 33
It removed the first 3 rows of dataframe in place.
Summary:
We learned about four different ways to delete first N rows of a dataframe.