Pandas : skip rows while reading csv file to a Dataframe using read_csv() in Python

In this article we will discuss how to skip rows from top , bottom or at specific indicies while reading a csv file and loading contents to a Dataframe.

Python panda’s library provides a function to read a csv file and load data to dataframe directly also skip specified lines from csv file i.e.

pandas.read_csv(filepath_or_buffer, skiprows=N, ....)

It can accepts large number of arguments. But here we will discuss few important arguments only i.e.
Arguments:

  • filepath_or_buffer : path of a csv file or it’s object.
  • skiprows : Line numbers to skip while reading csv.
    • If it’s an int then skip that lines from top
    • If it’s a list of int then skip lines at those index positions
    • If it’s a callable function then pass each index to this function to check if line to skipped or not.

It will read the given csv file by skipping the specified lines and load remaining lines to a dataframe.

To use this import pandas module like this,

import pandas as pd

Let’s understand by examples,

Suppose we have a simple CSV file users.csv and it’s contents are,

>>cat users.txt
Name,Age,City
jack,34,Sydeny
Riti,31,Delhi
Aadi,16,New York
Suse,32,Lucknow
Mark,33,Las vegas
Suri,35,Patna

Let’s load this csv file to a dataframe using read_csv() and skip rows in different ways,

Skipping N rows from top while reading a csv file to Dataframe

While calling pandas.read_csv() if we pass skiprows argument with int value, then it will skip those rows from top while reading csv file and initializing a dataframe.
For example if we want to skip 2 lines from top while reading users.csv file and initializing a dataframe i.e.

# Skip 2 rows from top in csv and initialize a dataframe
usersDf = pd.read_csv('users.csv', skiprows=2)

print('Contents of the Dataframe created by skipping top 2 lines from csv file ')
print(usersDf)

Output:

Contents of the Dataframe created by skipping top 2 lines from csv file 
   Riti  31      Delhi
0  Aadi  16   New York
1  Suse  32    Lucknow
2  Mark  33  Las vegas
3  Suri  35      Patna

It skipped the top 2 lines from csv and used 3rd line (at index 2) as header row and loaded the remaining rows from csv as data rows in the dataframe.

Now what if we want to skip some specific rows only while reading csv ?

Skipping rows at specific index positions while reading a csv file to Dataframe

While calling pandas.read_csv() if we pass skiprows argument as a list of ints, then it will skip the rows from csv at specified indices in the list. For example if we want to skip lines at index 0, 2 and 5 while reading users.csv file and initializing a dataframe i.e.

# Skip  rows at specific index
usersDf = pd.read_csv('users.csv', skiprows=[0,2,5])

print('Contents of the Dataframe created by skipping specifying lines from csv file ')
print(usersDf)

Output:

Contents of the Dataframe created by skipping specifying lines from csv file 
   jack  34    Sydeny
0  Aadi  16  New York
1  Suse  32   Lucknow
2  Suri  35     Patna

It skipped the lines at index position 0, 2 & 5 from csv and loaded the remaining rows from csv to the dataframe.

Skipping N rows from top except header while reading a csv file to Dataframe

As we saw in first example taht while reading users.csv on skipping 3 lines from top will make 3rd line as header row. But that’s not the row that contains column names.
So, if our csv file has header row and we want to skip first 2 data rows then we need to pass a list to skiprows i.e.

# Skip 2 rows from top except header
usersDf = pd.read_csv('users.csv', skiprows=[i for i in range(1,3)])

print('Contents of the Dataframe created by skipping 2 rows after header row from csv file ')
print(usersDf)

Output:

Contents of the Dataframe created by skipping 2 rows after header row from csv file 
   Name  Age       City
0  Aadi   16   New York
1  Suse   32    Lucknow
2  Mark   33  Las vegas
3  Suri   35      Patna

It will read the csv file to dataframe by skipping 2 lines after the header row in csv file.

Skip rows from based on condition while reading a csv file to Dataframe

We can also pass a callable function or lambda function to decide on which rows to skip. On passing callable function as argument in skiprows while calling pandas.read_csv(), it will call the function before reading each row to check if this rows should be skipped or not. It will pass the index postion of each ro in this function.
Let’s skip rows in csv file whose index position is multiple of 3 i.e. skip every 3rd line while reading csv file and loading dataframe out of it,

def logic(index):
    if index % 3 == 0:
       return True
    return False


# Skip rows from based on condition like skip every 3rd line
usersDf = pd.read_csv('users.csv', skiprows= lambda x: logic(x) )

print('Contents of the Dataframe created by skipping every 3rd row from csv file ')
print(usersDf)

Output:

Contents of the Dataframe created by skipping every 3rd row from csv file 
   jack  34     Sydeny
0  Riti  31      Delhi
1  Suse  32    Lucknow
2  Mark  33  Las vegas

Skip N rows from bottom / footer while reading a csv file to Dataframe

To skip N numbers of rows from bottom while reading a csv file to a dataframe please pass skipfooter & engine argument in  pandas.read_csv() i.e.

# Skip 2 rows from bottom
usersDf = pd.read_csv('users.csv', skipfooter=2, engine='python')

print('Contents of the Dataframe created by skipping bottom 2 rows from csv file ')
print(usersDf)

Output:

Contents of the Dataframe created by skipping bottom 2 rows from csv file 
   Name  Age      City
0  jack   34    Sydeny
1  Riti   31     Delhi
2  Aadi   16  New York
3  Suse   32   Lucknow

By default read_csv() uses the C engine for parsing but it doesn’t provide the functionality of skipping from bottom. If we want to use this functionality we must pass engine argument along with skipfooter otherwise we will get a warning like this,

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.

Complete example is as follows,

import pandas as pd

def logic(index):
    if index % 3 == 0:
       return True
    return False

def main():
    print('**** Skip n rows from top while reading csv file to a Dataframe ****')

    # Skip 2 rows from top in csv and initialize a dataframe
    usersDf = pd.read_csv('users.csv', skiprows=2)

    print('Contents of the Dataframe created by skipping top 2 lines from csv file ')
    print(usersDf)

    print('**** Skip rows at specific index from top while reading csv file to a Dataframe ****')

    # Skip  rows at specific index
    usersDf = pd.read_csv('users.csv', skiprows=[0,2,5])

    print('Contents of the Dataframe created by skipping specifying lines from csv file ')
    print(usersDf)

    print('**** Skip N rows top except header row while reading csv file to a Dataframe ****')

    # Skip 2 rows from top except header
    usersDf = pd.read_csv('users.csv', skiprows=[i for i in range(1,3)])

    print('Contents of the Dataframe created by skipping 2 rows after header row from csv file ')
    print(usersDf)

    print('**** Skip rows based on condition row while reading csv file to a Dataframe ****')

    # Skip rows from based on condition like skip every 3rd line
    usersDf = pd.read_csv('users.csv', skiprows= lambda x: logic(x) )

    print('Contents of the Dataframe created by skipping every 3rd row from csv file ')
    print(usersDf)

    print('**** Skip N rows from bottom while reading csv file to a Dataframe ****')
    # Skip 2 rows from bottom
    usersDf = pd.read_csv('users.csv', skipfooter=2, engine='python')

    print('Contents of the Dataframe created by skipping bottom 2 rows from csv file ')
    print(usersDf)



if __name__ == '__main__':
    main()

Output:

**** Skip n rows from top while reading csv file to a Dataframe ****
Contents of the Dataframe created by skipping top 2 lines from csv file 
   Riti  31      Delhi
0  Aadi  16   New York
1  Suse  32    Lucknow
2  Mark  33  Las vegas
3  Suri  35      Patna
**** Skip rows at specific index from top while reading csv file to a Dataframe ****
Contents of the Dataframe created by skipping specifying lines from csv file 
   jack  34    Sydeny
0  Aadi  16  New York
1  Suse  32   Lucknow
2  Suri  35     Patna
**** Skip N rows top except header row while reading csv file to a Dataframe ****
Contents of the Dataframe created by skipping 2 rows after header row from csv file 
   Name  Age       City
0  Aadi   16   New York
1  Suse   32    Lucknow
2  Mark   33  Las vegas
3  Suri   35      Patna
**** Skip rows based on condition row while reading csv file to a Dataframe ****
Contents of the Dataframe created by skipping every 3rd row from csv file 
   jack  34     Sydeny
0  Riti  31      Delhi
1  Suse  32    Lucknow
2  Mark  33  Las vegas
**** Skip N rows from bottom while reading csv file to a Dataframe ****
Contents of the Dataframe created by skipping bottom 2 rows from csv file 
   Name  Age      City
0  jack   34    Sydeny
1  Riti   31     Delhi
2  Aadi   16  New York
3  Suse   32   Lucknow

 

1 thought on “Pandas : skip rows while reading csv file to a Dataframe using read_csv() in Python”

  1. I am not able to do, please someone help to resolve my issue.

    i want to make comment column as per column value see below:

    A B C D comment column
    100 0 10 0 A deduction is 100 and C deduction is 10
    0 10 5 0 B deduction is 10 and C deduction is 5
    0 7 2 4 B deduction is 7 and C deduction is 2 and D deduction is 4

    How can we write python code to reflect respective comments.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top