In this article we will discuss how to skip rows from top , bottom or at specific indicies while reading a csv file and loading contents to a Dataframe.
Python panda’s library provides a function to read a csv file and load data to dataframe directly also skip specified lines from csv file i.e.
pandas.read_csv(filepath_or_buffer, skiprows=N, ....)
It can accepts large number of arguments. But here we will discuss few important arguments only i.e.
Arguments:
- filepath_or_buffer : path of a csv file or it’s object.
- skiprows : Line numbers to skip while reading csv.
- If it’s an int then skip that lines from top
- If it’s a list of int then skip lines at those index positions
- If it’s a callable function then pass each index to this function to check if line to skipped or not.
It will read the given csv file by skipping the specified lines and load remaining lines to a dataframe.
To use this import pandas module like this,
import pandas as pd
Let’s understand by examples,
Frequently Asked:
- Pandas: Select first column of dataframe in python
- Pandas: Select Rows where column values ends with a string
- Iterate over Rows of DataFrame in Pandas
- Import a CSV file into Pandas DataFrame
Suppose we have a simple CSV file users.csv and it’s contents are,
>>cat users.txt Name,Age,City jack,34,Sydeny Riti,31,Delhi Aadi,16,New York Suse,32,Lucknow Mark,33,Las vegas Suri,35,Patna
Let’s load this csv file to a dataframe using read_csv() and skip rows in different ways,
Skipping N rows from top while reading a csv file to Dataframe
While calling pandas.read_csv() if we pass skiprows argument with int value, then it will skip those rows from top while reading csv file and initializing a dataframe.
For example if we want to skip 2 lines from top while reading users.csv file and initializing a dataframe i.e.
# Skip 2 rows from top in csv and initialize a dataframe usersDf = pd.read_csv('users.csv', skiprows=2) print('Contents of the Dataframe created by skipping top 2 lines from csv file ') print(usersDf)
Output:
Contents of the Dataframe created by skipping top 2 lines from csv file Riti 31 Delhi 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna
It skipped the top 2 lines from csv and used 3rd line (at index 2) as header row and loaded the remaining rows from csv as data rows in the dataframe.
Now what if we want to skip some specific rows only while reading csv ?
Skipping rows at specific index positions while reading a csv file to Dataframe
While calling pandas.read_csv() if we pass skiprows argument as a list of ints, then it will skip the rows from csv at specified indices in the list. For example if we want to skip lines at index 0, 2 and 5 while reading users.csv file and initializing a dataframe i.e.
# Skip rows at specific index usersDf = pd.read_csv('users.csv', skiprows=[0,2,5]) print('Contents of the Dataframe created by skipping specifying lines from csv file ') print(usersDf)
Output:
Contents of the Dataframe created by skipping specifying lines from csv file jack 34 Sydeny 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Suri 35 Patna
It skipped the lines at index position 0, 2 & 5 from csv and loaded the remaining rows from csv to the dataframe.
Skipping N rows from top except header while reading a csv file to Dataframe
As we saw in first example taht while reading users.csv on skipping 3 lines from top will make 3rd line as header row. But that’s not the row that contains column names.
So, if our csv file has header row and we want to skip first 2 data rows then we need to pass a list to skiprows i.e.
# Skip 2 rows from top except header usersDf = pd.read_csv('users.csv', skiprows=[i for i in range(1,3)]) print('Contents of the Dataframe created by skipping 2 rows after header row from csv file ') print(usersDf)
Output:
Contents of the Dataframe created by skipping 2 rows after header row from csv file Name Age City 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna
It will read the csv file to dataframe by skipping 2 lines after the header row in csv file.
Skip rows from based on condition while reading a csv file to Dataframe
We can also pass a callable function or lambda function to decide on which rows to skip. On passing callable function as argument in skiprows while calling pandas.read_csv(), it will call the function before reading each row to check if this rows should be skipped or not. It will pass the index postion of each ro in this function.
Let’s skip rows in csv file whose index position is multiple of 3 i.e. skip every 3rd line while reading csv file and loading dataframe out of it,
def logic(index): if index % 3 == 0: return True return False # Skip rows from based on condition like skip every 3rd line usersDf = pd.read_csv('users.csv', skiprows= lambda x: logic(x) ) print('Contents of the Dataframe created by skipping every 3rd row from csv file ') print(usersDf)
Output:
Contents of the Dataframe created by skipping every 3rd row from csv file jack 34 Sydeny 0 Riti 31 Delhi 1 Suse 32 Lucknow 2 Mark 33 Las vegas
Skip N rows from bottom / footer while reading a csv file to Dataframe
To skip N numbers of rows from bottom while reading a csv file to a dataframe please pass skipfooter & engine argument in pandas.read_csv() i.e.
# Skip 2 rows from bottom usersDf = pd.read_csv('users.csv', skipfooter=2, engine='python') print('Contents of the Dataframe created by skipping bottom 2 rows from csv file ') print(usersDf)
Output:
Contents of the Dataframe created by skipping bottom 2 rows from csv file Name Age City 0 jack 34 Sydeny 1 Riti 31 Delhi 2 Aadi 16 New York 3 Suse 32 Lucknow
By default read_csv() uses the C engine for parsing but it doesn’t provide the functionality of skipping from bottom. If we want to use this functionality we must pass engine argument along with skipfooter otherwise we will get a warning like this,
ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
Complete example is as follows,
import pandas as pd def logic(index): if index % 3 == 0: return True return False def main(): print('**** Skip n rows from top while reading csv file to a Dataframe ****') # Skip 2 rows from top in csv and initialize a dataframe usersDf = pd.read_csv('users.csv', skiprows=2) print('Contents of the Dataframe created by skipping top 2 lines from csv file ') print(usersDf) print('**** Skip rows at specific index from top while reading csv file to a Dataframe ****') # Skip rows at specific index usersDf = pd.read_csv('users.csv', skiprows=[0,2,5]) print('Contents of the Dataframe created by skipping specifying lines from csv file ') print(usersDf) print('**** Skip N rows top except header row while reading csv file to a Dataframe ****') # Skip 2 rows from top except header usersDf = pd.read_csv('users.csv', skiprows=[i for i in range(1,3)]) print('Contents of the Dataframe created by skipping 2 rows after header row from csv file ') print(usersDf) print('**** Skip rows based on condition row while reading csv file to a Dataframe ****') # Skip rows from based on condition like skip every 3rd line usersDf = pd.read_csv('users.csv', skiprows= lambda x: logic(x) ) print('Contents of the Dataframe created by skipping every 3rd row from csv file ') print(usersDf) print('**** Skip N rows from bottom while reading csv file to a Dataframe ****') # Skip 2 rows from bottom usersDf = pd.read_csv('users.csv', skipfooter=2, engine='python') print('Contents of the Dataframe created by skipping bottom 2 rows from csv file ') print(usersDf) if __name__ == '__main__': main()
Output:
**** Skip n rows from top while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping top 2 lines from csv file Riti 31 Delhi 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna **** Skip rows at specific index from top while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping specifying lines from csv file jack 34 Sydeny 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Suri 35 Patna **** Skip N rows top except header row while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping 2 rows after header row from csv file Name Age City 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna **** Skip rows based on condition row while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping every 3rd row from csv file jack 34 Sydeny 0 Riti 31 Delhi 1 Suse 32 Lucknow 2 Mark 33 Las vegas **** Skip N rows from bottom while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping bottom 2 rows from csv file Name Age City 0 jack 34 Sydeny 1 Riti 31 Delhi 2 Aadi 16 New York 3 Suse 32 Lucknow
I am not able to do, please someone help to resolve my issue.
i want to make comment column as per column value see below:
A B C D comment column
100 0 10 0 A deduction is 100 and C deduction is 10
0 10 5 0 B deduction is 10 and C deduction is 5
0 7 2 4 B deduction is 7 and C deduction is 2 and D deduction is 4
How can we write python code to reflect respective comments.