Pandas : Read csv file to Dataframe with custom delimiter in Python

In this article we will discuss how to read a CSV file with different type of delimiters to a Dataframe.

Python’s Pandas library provides a function to load a csv file to a Dataframe i.e.

pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, ....)

It reads the content of a csv file at given path, then loads the content to a Dataframe and returns that. It uses comma (,) as default delimiter or separator while parsing a file. But we can also specify our custom separator or a regular expression to be used as custom separator.

To use pandas.read_csv() import pandas module i.e.

import pandas as pd

Using read_csv() with custom delimiter

Suppose we have a file ‘users.csv‘ in which columns are separated by string ‘__’ like this.
Contents of file users.csv are as follows,

Name__Age__City
jack__34__Sydeny
Riti__31__Delhi
Aadi__16__New York
Suse__32__Lucknow
Mark__33__Las vegas
Suri__35__Patna

Now to load this kind of file to a dataframe object using pandas.read_csv() we have to pass the sep & engine arguments to pandas.read_csv() i.e.

# Read a csv file to a dataframe with custom delimiter
usersDf =  pd.read_csv('users.csv', sep='__'  , engine='python')

print('Contents of Dataframe : ')
print(usersDf)

Output:

Contents of Dataframe : 
   Name  Age       City
0  jack   34     Sydeny
1  Riti   31      Delhi
2  Aadi   16   New York
3  Suse   32    Lucknow
4  Mark   33  Las vegas
5  Suri   35      Patna

Here, sep argument will be used as separator or delimiter. If sep argument is not specified then default engine for parsing ( C Engine) will be used which uses ‘,’ as delimiter. So, while specifying the custom sep argument we must specify the engine argument as ‘python’, otherwise we will get warning like this,

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex);

You can avoid this warning by specifying engine=’python’.

Using read_csv() with white space or tab as delimiter

As we have seen in above example, that we can pass custom delimiters. Now suppose we have a file in which columns are separated by either white space or tab i.e.
Contents of file users_4.csv are,

Name   Age City
jack    34  Sydeny
Riti   31  Delhi

Now, to load this kind of file to dataframe with pandas.read_csv() pass ‘\s+’ as separator. Here \s+ means any one or more white space character.

# Read a csv file to a dataframe with delimiter as space or tab
usersDf =  pd.read_csv('users_4.csv',  sep='\s+', engine='python')

print('Contents of Dataframe : ')
print(usersDf)

Contents of the dataframe returned are,

 *** Using pandas.read_csv() with space or tab as delimiters ***
Contents of Dataframe : 
   Name  Age    City
0  jack   34  Sydeny
1  Riti   31   Delhi

Using read_csv() with regular expression for delimiters

Suppose we have a file where multiple char delimiters are used instead of a single one. Like,

Contents of file users_5.csv are,

Name,Age|City
jack,34_Sydeny
Riti:31,Delhi
Aadi,16:New York
Suse,32:Lucknow
Mark,33,Las vegas
Suri,35:Patna

Now, to load this kind of file to dataframe with read_csv() pass a regular expression i.e. ‘[:,|_]’ in sep argument. This regular expression means use any of these characters ( , : | ) asa delimiter or separator i.e.

# Read a csv file to a dataframe with multiple delimiters in regular expression
usersDf =  pd.read_csv('users_5.csv',  sep='[:,|_]', engine='python')

print('Contents of Dataframe : ')
print(usersDf)

Output:

Contents of Dataframe : 
   Name  Age       City
0  jack   34     Sydeny
1  Riti   31      Delhi
2  Aadi   16   New York
3  Suse   32    Lucknow
4  Mark   33  Las vegas
5  Suri   35      Patna

Complete example is as follows:

import pandas as pd


def main():

   print(' *** Using pandas.read_csv() with Custom delimiter ***')

   # Read a csv file to a dataframe with custom delimiter
   usersDf =  pd.read_csv('users_3.csv', sep='__'  , engine='python')

   print('Contents of Dataframe : ')
   print(usersDf)

   print('********')

   print(' *** Using pandas.read_csv() with space or tab as delimiters ***')

   # Read a csv file to a dataframe with delimiter as space or tab
   usersDf =  pd.read_csv('users_4.csv',  sep='\s+', engine='python')

   print('Contents of Dataframe : ')
   print(usersDf)


   print(' *** Using pandas.read_csv() with multiple char delimiters ***')

   # Read a csv file to a dataframe with multiple delimiters in regular expression
   usersDf =  pd.read_csv('users_5.csv',  sep='[:,|_]', engine='python')

   print('Contents of Dataframe : ')
   print(usersDf)

if __name__ == '__main__':
   main()

Output:

 *** Using pandas.read_csv() with Custom delimiter ***
Contents of Dataframe : 
   Name  Age       City
0  jack   34     Sydeny
1  Riti   31      Delhi
2  Aadi   16   New York
3  Suse   32    Lucknow
4  Mark   33  Las vegas
5  Suri   35      Patna
********
 *** Using pandas.read_csv() with space or tab as delimiters ***
Contents of Dataframe : 
   Name  Age    City
0  jack   34  Sydeny
1  Riti   31   Delhi
 *** Using pandas.read_csv() with multiple char delimiters ***
Contents of Dataframe : 
   Name  Age       City
0  jack   34     Sydeny
1  Riti   31      Delhi
2  Aadi   16   New York
3  Suse   32    Lucknow
4  Mark   33  Las vegas
5  Suri   35      Patna

 

 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top