A DataFrame is a data structure that stores the data in rows and columns. In this article we will discuss how to import a csv file into a Pandas DataFrame in Python.
Table of Contents
- Import CSV to Pandas Dataframe using read_csv() function
- Import Specific columns from CSV file to Pandas DataFrame
Let’s create a csv file with the given data
ID,NAME,AGE,SUBJECTS 1,thanmai,21,php 2,sravan,22,java 3,deepika,21,html 4,jyothika,23,dbms 5,durga,21,"linux,c#"
We can save this file as csv_data.csv in the current directory.
Import CSV to Pandas Dataframe using read_csv() function
Here , we will use the read_csv() function to import a csv file into the pandas dataframe. Let’s look at the syntax of this method,
pandas.read_csv(filename/path, names, skiprows, nrows,index_col, header,.......)
where,
- filename is the name of the csv file
- path is the the file location
- Remaining all are the optional parameters. We will discuss each of them with examples.
Read CSV file into Pandas Dataframe with first row as header
The header parameter in read_csv() function specifies the column names. Default value is ‘ infer ‘, it means column names will be inferred from the first line of the csv file
Frequently Asked:
- Replace column values with regex in Pandas
- Pandas: Select dataframe columns containing string
- Pandas: Delete last column of dataframe in python
- Combine two Series into a DataFrame in Pandas
import pandas as pd #read with headers df=pd.read_csv("csv_data.csv") #display print(df)
Output:
ID NAME AGE SUBJECTS 0 1 thanmai 21 php 1 2 sravan 22 java 2 3 deepika 21 html 3 4 jyothika 23 dbms 4 5 durga 21 linux,c#
Read CSV file into Pandas Dataframe with Custom Index
This index_col parameter in read_csv() function is used to set the index of the dataframe. We can specify the name of a column from the csv as the index column. By default it is None.
Example: Here we are going to import csv to dataframe by setting AGE column as index.
import pandas as pd # Create dataframe from csv file by # setting the AGE column as index column df=pd.read_csv( "csv_data.csv", index_col='AGE') #display dataframe print(df)
Output:
ID NAME SUBJECTS AGE 21 1 thanmai php 22 2 sravan java 21 3 deepika html 23 4 jyothika dbms 21 5 durga linux,c#
Read CSV file into Pandas Dataframe with new Column Names
This names parameter in the read_csv() function is used to set the columns names of the dataframe. we can define the column names in a list. By default it is None.
Example: Here we are going to assign column names to dataframe
import pandas as pd # Set the column names while loading CSV to Dataframe df=pd.read_csv( "csv_data.csv", names=['student_id','name','age','subjects']) # Display the Dataframe print(df)
Output:
student_id name age subjects 0 ID NAME AGE SUBJECTS 1 1 thanmai 21 php 2 2 sravan 22 java 3 3 deepika 21 html 4 4 jyothika 23 dbms 5 5 durga 21 linux,c#
Here we assigned the new column names while loading Dataframe from csv. New column names are ‘student_id’, ‘name’,’ age’ and ‘subjects’.
Read CSV file into Pandas Dataframe and Skip Rows
The skiprows parameter in read_csv() function is used to remove the rows from the top of the dataframe. We can specify the number of rows to be skipped. By default it is None.
Syntax is as follows,
pandas.read_csv(filename/path, skiprows=n)
Where, n is the rows to be skipped in pandas dataframe.
Example: Here we are going to skip first three rows from the dataframe
import pandas as pd # Skip first 3 rows while importing csv to Dataframe df=pd.read_csv("csv_data.csv", skiprows=3) # Display the Dataframe print(df)
Output:
3 deepika 21 html 0 4 jyothika 23 dbms 1 5 durga 21 linux,c#
Read first N rows of CSV file to Pandas Dataframe
The nrows parameter of the read_csv() function is used to get the N rows from the top of the CSV and load it into the Dataframe. We can specify the number of first N rows to be loaded. By default it is None.
Syntax is as follows:
pandas.read_csv(filename/path, nrows=n)
Where, n is the rows to be returned from the pandas dataframe.
Example: Here we are going to get first three rows from the dataframe
import pandas as pd # Read first three rows from CSV file to Pandas Dataframe df=pd.read_csv( "csv_data.csv", nrows=3) print(df)
Output:
ID NAME AGE SUBJECTS 0 1 thanmai 21 php 1 2 sravan 22 java 2 3 deepika 21 html
It is usefull when you are dealing with large files. You can read small chunk of small csv file to Dataframe.
Import Specific columns from CSV file to Pandas DataFrame
After importing the csv into the dataframe we will use columns parameter of dataframe to keep only specified columns.
Example: Here we are going to import only AGE column from CSV to the dataframe
import pandas as pd df=pd.read_csv("csv_data.csv") # Import AGE Column df = pd.DataFrame(df,columns=['AGE']) # display the Dataframe print(df)
Output:
AGE 0 21 1 22 2 21 3 23 4 21
Summary
We learned to import a csv file into a Pandas DataFrame by using the read_csv() function and also discussed several parameters of read_csv().