In this article we will discuss how to skip rows from top , bottom or at specific indicies while reading a csv file and loading contents to a Dataframe.

Python panda’s library provides a function to read a csv file and load data to dataframe directly also skip specified lines from csv file i.e.

It can accepts large number of arguments. But here we will discuss few important arguments only i.e.
Arguments:

  • filepath_or_buffer : path of a csv file or it’s object.
  • skiprows : Line numbers to skip while reading csv.
    • If it’s an int then skip that lines from top
    • If it’s a list of int then skip lines at those index positions
    • If it’s a callable function then pass each index to this function to check if line to skipped or not.

It will read the given csv file by skipping the specified lines and load remaining lines to a dataframe.

To use this import pandas module like this,

Let’s understand by examples,

Suppose we have a simple CSV file users.csv and it’s contents are,

Let’s load this csv file to a dataframe using read_csv() and skip rows in different ways,

Skipping N rows from top while reading a csv file to Dataframe

While calling pandas.read_csv() if we pass skiprows argument with int value, then it will skip those rows from top while reading csv file and initializing a dataframe.
For example if we want to skip 2 lines from top while reading users.csv file and initializing a dataframe i.e.

Output:

It skipped the top 2 lines from csv and used 3rd line (at index 2) as header row and loaded the remaining rows from csv as data rows in the dataframe.

Now what if we want to skip some specific rows only while reading csv ?

Skipping rows at specific index positions while reading a csv file to Dataframe

While calling pandas.read_csv() if we pass skiprows argument as a list of ints, then it will skip the rows from csv at specified indices in the list. For example if we want to skip lines at index 0, 2 and 5 while reading users.csv file and initializing a dataframe i.e.

Output:

It skipped the lines at index position 0, 2 & 5 from csv and loaded the remaining rows from csv to the dataframe.

Skipping N rows from top except header while reading a csv file to Dataframe

As we saw in first example taht while reading users.csv on skipping 3 lines from top will make 3rd line as header row. But that’s not the row that contains column names.
So, if our csv file has header row and we want to skip first 2 data rows then we need to pass a list to skiprows i.e.

Output:

It will read the csv file to dataframe by skipping 2 lines after the header row in csv file.

Skip rows from based on condition while reading a csv file to Dataframe

We can also pass a callable function or lambda function to decide on which rows to skip. On passing callable function as argument in skiprows while calling pandas.read_csv(), it will call the function before reading each row to check if this rows should be skipped or not. It will pass the index postion of each ro in this function.
Let’s skip rows in csv file whose index position is multiple of 3 i.e. skip every 3rd line while reading csv file and loading dataframe out of it,

Output:

Skip N rows from bottom / footer while reading a csv file to Dataframe

To skip N numbers of rows from bottom while reading a csv file to a dataframe please pass skipfooter & engine argument in  pandas.read_csv() i.e.

Output:

By default read_csv() uses the C engine for parsing but it doesn’t provide the functionality of skipping from bottom. If we want to use this functionality we must pass engine argument along with skipfooter otherwise we will get a warning like this,

Complete example is as follows,

Output:

 

Join LinkedIn Group of Python Professional Developers who wish to expand their network and share ideas.

You can also follow us On Twitter :

Click Here to Subscribe for more Articles / Tutorials like this.