Import a CSV file into Pandas DataFrame

A DataFrame is a data structure that stores the data in rows and columns. In this article we will discuss how to import a csv file into a Pandas DataFrame in Python.

Table of Contents

Let’s create a csv file with the given data

ID,NAME,AGE,SUBJECTS
1,thanmai,21,php
2,sravan,22,java
3,deepika,21,html
4,jyothika,23,dbms
5,durga,21,"linux,c#"

We can save this file as csv_data.csv in the current directory.

Import CSV to Pandas Dataframe using read_csv() function

Here , we will use the read_csv() function to import a csv file into the pandas dataframe. Let’s look at the syntax of this method,

Advertisements
pandas.read_csv(filename/path, names, skiprows, nrows,index_col, header,.......)

where,

  • filename is the name of the csv file
  • path is the the file location
  • Remaining all are the optional parameters. We will discuss each of them with examples.

Read CSV file into Pandas Dataframe with first row as header

The header parameter in read_csv() function specifies the column names. Default value is ‘ infer ‘, it means column names will be inferred from the first line of the csv file

import pandas as pd

#read with headers
df=pd.read_csv("csv_data.csv")

#display
print(df)

Output:

   ID      NAME  AGE  SUBJECTS
0   1   thanmai   21       php
1   2    sravan   22      java
2   3   deepika   21      html
3   4  jyothika   23      dbms
4   5     durga   21  linux,c#

Read CSV file into Pandas Dataframe with Custom Index

This index_col parameter in read_csv() function is used to set the index of the dataframe. We can specify the name of a column from the csv as the index column. By default it is None.

Example: Here we are going to import csv to dataframe by setting AGE column as index.

import pandas as pd

# Create dataframe from csv file by 
# setting the AGE column as index column
df=pd.read_csv( "csv_data.csv",
                index_col='AGE')

#display dataframe
print(df)

Output:

     ID      NAME  SUBJECTS
AGE
21    1   thanmai       php
22    2    sravan      java
21    3   deepika      html
23    4  jyothika      dbms
21    5     durga  linux,c#

Read CSV file into Pandas Dataframe with new Column Names

This names parameter in the read_csv() function is used to set the columns names of the dataframe. we can define the column names in a list. By default it is None.

Example: Here we are going to assign column names to dataframe

import pandas as pd

# Set the column names while loading CSV to Dataframe
df=pd.read_csv( "csv_data.csv",
                names=['student_id','name','age','subjects'])

# Display the Dataframe
print(df)

Output:

  student_id      name  age  subjects
0         ID      NAME  AGE  SUBJECTS
1          1   thanmai   21       php
2          2    sravan   22      java
3          3   deepika   21      html
4          4  jyothika   23      dbms
5          5     durga   21  linux,c#

Here we assigned the new column names while loading Dataframe from csv. New column names are ‘student_id’, ‘name’,’ age’ and ‘subjects’.

Read CSV file into Pandas Dataframe and Skip Rows

The skiprows parameter in read_csv() function is used to remove the rows from the top of the dataframe. We can specify the number of rows to be skipped. By default it is None.

Syntax is as follows,

pandas.read_csv(filename/path, skiprows=n)

Where, n is the rows to be skipped in pandas dataframe.

Example: Here we are going to skip first three rows from the dataframe

import pandas as pd

# Skip first 3 rows while importing csv to Dataframe
df=pd.read_csv("csv_data.csv",  skiprows=3)

# Display the Dataframe
print(df)

Output:

   3   deepika  21      html
0  4  jyothika  23      dbms
1  5     durga  21  linux,c#

Read first N rows of CSV file to Pandas Dataframe

The nrows parameter of the read_csv() function is used to get the N rows from the top of the CSV and load it into the Dataframe. We can specify the number of first N rows to be loaded. By default it is None.

Syntax is as follows:

pandas.read_csv(filename/path, nrows=n)

Where, n is the rows to be returned from the pandas dataframe.

Example: Here we are going to get first three rows from the dataframe

import pandas as pd

# Read first three rows from CSV file to Pandas Dataframe
df=pd.read_csv( "csv_data.csv", nrows=3)

print(df)

Output:

   ID     NAME  AGE SUBJECTS
0   1  thanmai   21      php
1   2   sravan   22     java
2   3  deepika   21     html

It is usefull when you are dealing with large files. You can read small chunk of small csv file to Dataframe.

Import Specific columns from CSV file to Pandas DataFrame

After importing the csv into the dataframe we will use columns parameter of dataframe to keep only specified columns.

Example: Here we are going to import only AGE column from CSV to the dataframe

import pandas as pd

df=pd.read_csv("csv_data.csv")

# Import AGE Column
df = pd.DataFrame(df,columns=['AGE'])

# display the Dataframe
print(df)

Output:

   AGE
0   21
1   22
2   21
3   23
4   21

Summary

We learned to import a csv file into a Pandas DataFrame by using the read_csv() function and also discussed several parameters of read_csv().

Advertisements

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top