How to convert Dataframe column type from string to date time

In this article we will discuss how to convert data type of a dataframe column from string to datetime. The data can be in custom string formats or embedded in big text. Also, how to handle error while converting the data type.

Python’s Pandas module provides a function to convert a given argument to datetime i.e.

pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)

Important parameters:

  • arg : Element to be converted to datetime type like int, float, string, datetime, list, tuple, 1-d array or Series.
  • errors : Way to handle error. It can be : {‘ignore’, ‘raise’, ‘coerce’}, default value is ‘raise’
    • ‘raise’: In case of invalid parsing raise an exception
    • ‘coerce’: In case of invalid parsing set as NaT
    • ‘ignore’: In case of invalid parsing return the input
  • format : string, default None
    • Like we pass in strftime to parse date & time string in format eg “%d/%m/%Y” etc.

Returns:

It Converts the given value to date time format and return value depends on the input, for example,

  • If a series of string is passed then it will return a series of datetime64 type.
  • If a scalar entity is passed then it returns a datetime64 object.

As this function can covert the data type of a series from string to datetime. Let’s see how to use this to convert data type of a column from string to datetime.

Convert the Data type of a column from string to datetime64

Suppose we have a dataframe in which column ‘DOB’ contains the dates in string format ‘DD/MM/YYYY’ i.e.

# List of Tuples
empoyees = [('jack', '29/10/1988', 'Sydney', 155) ,
            ('Riti', '23/11/1981', 'Delhi' , 177) ,
            ('Aadi', '10/04/1982', 'Mumbai', 81) ,
            ('Mohit', '21/05/1983','Delhi' , 167) ,
            ('Veena', '16/08/1984', 'Delhi' , 144) ,
            ('Shaunak', '07/09/1985', 'Mumbai', 135 ),
            ('Shaun', '11/10/1998', 'Colombo', 111)
           ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

print(empDfObj)

Contents of the dataframe empDfObj is as follows,

      Name         DOB     City  Marks
0     jack  29/10/1988   Sydney    155
1     Riti  23/11/1981    Delhi    177
2     Aadi  10/04/1982   Mumbai     81
3    Mohit  21/05/1983    Delhi    167
4    Veena  16/08/1984    Delhi    144
5  Shaunak  07/09/1985   Mumbai    135
6    Shaun  11/10/1998  Colombo    111

To check the data types of columns use attribute Dataframe.dtypes i.e.

print(empDfObj.dtypes)

Output:

Name     object
DOB      object
City     object
Marks     int64
dtype: object

Data type of column ‘DOB’ is string, basically it contains the date of births as string but in DD/MM/YYYY format. Now to convert the data type of column ‘DOB’ to datetime64 we will use pandas.to_datetime() i.e.

# Convert the data type of column 'DOB' from string (DD/MM/YYYY) to datetime64
empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'])

Contents of the updated dataframe are,

      Name        DOB     City  Marks
0     jack 1988-10-29   Sydney    155
1     Riti 1981-11-23    Delhi    177
2     Aadi 1982-10-04   Mumbai     81
3    Mohit 1983-05-21    Delhi    167
4    Veena 1984-08-16    Delhi    144
5  Shaunak 1985-07-09   Mumbai    135
6    Shaun 1998-11-10  Colombo    111

Lets check the data types of columns in updated dataframe,

print(empDfObj.dtypes)

Output:

Name             object
DOB      datetime64[ns]
City             object
Marks             int64
dtype: object

Now the data type of column ‘DOB’ is datetime64.

pd.to_datetime() converts the date time strings in ISO8601 format to datetime64 type. Strings type that it can automatically handles are,

‘DD-MM-YYYY HH:MM AM/PM’
‘YYYY-MM-DDTHH:MM:SS’
‘YYYY-MM-DDT HH:MM:SS.ssssss’
etc.
Let’s see an example where we have dates as string type but in different ISO8601 format i.e.

# List of Tuples
empoyees = [('jack', '29-10-1988 11:00 PM', 'Sydney', 155) ,
          ('Riti', '1981-11-29T13:00:00Z', 'Delhi' , 177) ,
          ('Mohit', '21/05/1983 21:00:19.678908','Delhi' , 167) ,
           ]

# Create a DataFrame object with column DOB containing date time strings in different formats
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

print(empDfObj)

Contents of the dataframe are,

    Name                         DOB    City  Marks
0   jack         29-10-1988 11:00 PM  Sydney    155
1   Riti        1981-11-29T13:00:00Z   Delhi    177
2  Mohit  21/05/1983 21:00:19.678908   Delhi    167

Let’s check the data types of columns i.e.

print(empDfObj.dtypes)

Output:

Name     object
DOB      object
City     object
Marks     int64
dtype: object

Datatype of column ‘DOB’ is string but it is in different string format. Let’s convert the data type of column ‘DOB’ to datetime64 i.e.

# Convert the data type of column 'DOB' from string with different ISO8601 formats to datetime64
empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'])

Lets check the data types of columns in updated dataframe,

print(empDfObj.dtypes)

Output:

Name             object
DOB      datetime64[ns]
City             object
Marks             int64
dtype: object

Now the data type of column ‘DOB’ is datetime64.

Convert the Data type of a column from custom format string to datetime64

There might be cases when our dataframe have columns which contains date & time column in custom formats like,
DDMMYYYY
DD–MM–YY

To convert data type of column from these custom strings formats to datetime, we need to pass the format argument in pd.to_datetime(). Let’s see an example where we have column in dataframe which contains dates in custom string format i.e. DDMMYYYY,

# List of Tuples
empoyees = [('Shaunak', '07091985', 'Mumbai', 135),
            ('Riti', '11101998', 'Colombo', 111)
            ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

print(empDfObj)

Contents of the dataframe are,

      Name       DOB     City  Marks
0  Shaunak  07091985   Mumbai    135
1     Riti  11101998  Colombo    111

Let’s check the data types of columns i.e.

print(empDfObj.dtypes)

Output:

Name             object
DOB              object
City             object
Marks             int64
dtype: object

Datatype of column ‘DOB‘ is string. Let’s convert the data type of column ‘DOB’ to datetime64 i.e.

# Convert the data type of column 'DOB' from string of custom formats to datetime64
empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y')

Lets check the data types of columns in updated dataframe,

print(empDfObj.dtypes)

Output:

Name             object
DOB      datetime64[ns]
City             object
Marks             int64
dtype: object

Now the data type of column ‘DOB’ is datetime64. Here we passed the date time string format in format argument of pd.to_datetime().

Convert the Data type of a column from string to datetime by extracting date & time strings from big string

There might be scenarios when our column in dataframe contains some text and we need to fetch date & time from those texts like,

  • date of birth is 07091985
  • 11101998 is DOB

To fetch datetime from this big text we need to pass exact argument in pd.to_dataframe(), if passed as False it will try to match the format anywhere in string. Let’s understand by an example,
Suppose we have a dataframe with a column DOB, that contains text in which date time is at different place like,

# List of Tuples
empoyees = [('Shaunak', 'date of birth is 07091985', 'Mumbai', 135),
            ('Riti', '11101998 is DOB', 'Colombo', 111)
            ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

print("Contents of the Dataframe : ")
print(empDfObj)

Contents of the dataframe are,

Contents of the Dataframe : 
      Name                        DOB     City  Marks
0  Shaunak  date of birth is 07091985   Mumbai    135
1     Riti            11101998 is DOB  Colombo    111

Let’s check the data types of columns i.e.

print(empDfObj.dtypes)

Output:

Name             object
DOB              object
City             object
Marks             int64

Datatype of column ‘DOB’ is string. Let’s convert the contents of column DOB by removing extra text and making the data type of column ‘DOB’ to datetime64 i.e.

# extract the date & time from string in DOB column and convert type of column to datetime64
empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y', exact=False)

Lets check the data types of columns in updated dataframe,

print(empDfObj)

Output:

      Name        DOB     City  Marks
0  Shaunak 1985-07-09   Mumbai    135
1     Riti 1998-11-10  Colombo    111

Now the data type of column ‘DOB’ is datetime64 and extra text is also removed.

Another Example : Extract date & time from big string in a column and add new columns of datetime64 format

Suppose we have a column in dataframe that contains big text and inside the text it contains date and time at different places i.e.

# List of Tuples
empoyees = [('Shaunak', '11:00 PM on the date 07091985', 'Mumbai', 135),
            ('Riti', '11101998 and morining 8:00 AM', 'Colombo', 111)
            ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

print("Contents of the Dataframe : ")
print(empDfObj)

Contents of the dataframe are,

      Name                            DOB     City  Marks
0  Shaunak  11:00 PM on the date 07091985   Mumbai    135
1     Riti  11101998 and morining 8:00 AM  Colombo    111

Datatype of column ‘DOB’ is string / object. Let’s add new columns in dataframe that contains date and time from this big text i.e.

empDfObj['DOB_time'] = pd.to_datetime(empDfObj['DOB'], format='%H:%M %p', exact=False)
empDfObj['DOB_date'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y', exact=False)

print('Modified dataframe :')
print(empDfObj)

Output

Modified dataframe :
      Name                            DOB     City  Marks            DOB_time   DOB_date
0  Shaunak  11:00 PM on the date 07091985  Mumbai   135   1900-01-01 11:00:00 1985-07-09
1  Riti     11101998 and morining 8:00 AM  Colombo  111   1900-01-01 08:00:00 1998-11-10

For DOB_time column we provided time only, therefore it picked the default date i.e. 1900-01-01. Where as DOB_date contains the date only. But the data type of both DOB_time & DOB_date is datetime64.

Handle error while Converting the Data type of a column from string to datetime

To handle errors while converting data type of a column we can pass the errors arguments to customize the behavior i.e.

  • ‘raise’: In case of invalid parsing raise an exception
  • ‘coerce’: In case of invalid parsing set as NaT
  • ‘ignore’: In case of invalid parsing return the input

Let’s see an example of ignoring errors while converting the type of column from string to datetime

# List of Tuples
empoyees = [('Shaunak', '07091985', 'Mumbai', 135),
            ('Riti', '11101998', 'Colombo', 111)
            ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

print("Contents of the Dataframe : ")
print(empDfObj)

print('Data types of columns in original dataframe')
print(empDfObj.dtypes)

# Ignore errors while converting the type of column from string to datetime
empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], errors='ignore')

print("Contents of the Dataframe : ")
print(empDfObj)

print('Data types of columns in modified dataframe')
print(empDfObj.dtypes)

Output

Contents of the Dataframe : 
      Name       DOB     City  Marks
0  Shaunak  07091985  Mumbai   135  
1  Riti     11101998  Colombo  111  
Data types of columns in original dataframe
Name     object
DOB      object
City     object
Marks    int64 
dtype: object
Contents of the Dataframe : 
      Name       DOB     City  Marks
0  Shaunak  07091985  Mumbai   135  
1  Riti     11101998  Colombo  111  
Data types of columns in modified dataframe
Name     object
DOB      object
City     object
Marks    int64 
dtype: object

Complete example is as follows,

import pandas as pd

def main():

    # List of Tuples
    empoyees = [('jack', '29/10/1988', 'Sydney', 155) ,
                ('Riti', '23/11/1981', 'Delhi' , 177) ,
                ('Aadi', '10/04/1982', 'Mumbai', 81) ,
                ('Mohit', '21/05/1983','Delhi' , 167) ,
                ('Veena', '16/08/1984', 'Delhi' , 144) ,
                ('Shaunak', '07/09/1985', 'Mumbai', 135 ),
                ('Shaun', '11/10/1998', 'Colombo', 111)
               ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

    print("Contents of the Original Dataframe : ")
    print(empDfObj)

    print('Data types of columns in original dataframe')
    print(empDfObj.dtypes)

    print('Convert the Data type of a column from string to datetime')

    print('Convert the Data type of a column from string in format DD/MM/YYYY to datetime')

    # Convert the data type of column 'DOB' from string (DD/MM/YYYY) to datetime64
    empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'])

    print('Modified dataframe :')
    print(empDfObj)

    print('Data types of columns in modified dataframe')
    print(empDfObj.dtypes)

    print('Convert the Data type of a column from string (different ISO8601 formats) to datetime64')

    # List of Tuples
    empoyees = [('jack', '29-10-1988 11:00 PM', 'Sydney', 155) ,
              ('Riti', '1981-11-29T13:00:00Z', 'Delhi' , 177) ,
              ('Mohit', '21/05/1983 21:00:19.678908','Delhi' , 167) ,
               ]

    # Create a DataFrame object with column DOB containing date time strings in different formats
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

    print("Contents of the Original Dataframe : ")
    print(empDfObj)

    print('Data types of columns in original dataframe')
    print(empDfObj.dtypes)


    # Convert the data type of column 'DOB' from string with different ISO8601 formats to datetime64
    empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'])

    print('Modified dataframe :')
    print(empDfObj)

    print('Data types of columns in modified dataframe')
    print(empDfObj.dtypes)

    print('--Convert the Data type of a column from custom format string to datetime64')

    # List of Tuples
    empoyees = [('Shaunak', '07091985', 'Mumbai', 135),
                ('Riti', '11101998', 'Colombo', 111)
                ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

    print("Contents of the Dataframe : ")
    print(empDfObj)

    # Convert the data type of column 'DOB' from string of custom formats to datetime64
    empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y')

    print('Modified dataframe :')
    print(empDfObj)

    print('Data types of columns in modified dataframe')
    print(empDfObj.dtypes)

    print('--Convert the Data type of a column from string to datetime by extracting date & time strings from big string')

    print('Example 1 : extract dates from string and convert the column type to datetime64')

    # List of Tuples
    empoyees = [('Shaunak', 'date of birth is 07091985', 'Mumbai', 135),
                ('Riti', '11101998 is DOB', 'Colombo', 111)
                ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

    print("Contents of the Dataframe : ")
    print(empDfObj)

    # extract the date & time from string in DOB column and convert type of column to datetime64
    empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y', exact=False)

    print('Modified dataframe :')
    print(empDfObj)

    print('Data types of columns in modified dataframe')
    print(empDfObj.dtypes)

    print('Example : extract date & time from string and add new columns of datetime64 format')

    # List of Tuples
    empoyees = [('Shaunak', '11:00 PM on the date 07091985', 'Mumbai', 135),
                ('Riti', '11101998 and morining 8:00 AM', 'Colombo', 111)
                ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

    print("Contents of the Dataframe : ")
    print(empDfObj)

    empDfObj['DOB_time'] = pd.to_datetime(empDfObj['DOB'], format='%H:%M %p', exact=False)
    empDfObj['DOB_date'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y', exact=False)

    print('Modified dataframe :')
    print(empDfObj)

    print('Data types of columns in modified dataframe')
    print(empDfObj.dtypes)

    print('--Handle error while Converting the Data type of a column from string to datetime')

    # List of Tuples
    empoyees = [('Shaunak', '07091985', 'Mumbai', 135),
                ('Riti', '11101998', 'Colombo', 111)
                ]

    # Create a DataFrame object
    empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks'])

    print("Contents of the Dataframe : ")
    print(empDfObj)

    print('Data types of columns in original dataframe')
    print(empDfObj.dtypes)

    # Ignore errors while converting the type of column from string to datetime
    empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], errors='ignore')

    print("Contents of the Dataframe : ")
    print(empDfObj)

    print('Data types of columns in modified dataframe')
    print(empDfObj.dtypes)


if __name__ == '__main__':
 main()

Output:

Contents of the Original Dataframe : 
      Name         DOB     City  Marks
0     jack  29/10/1988   Sydney    155
1     Riti  23/11/1981    Delhi    177
2     Aadi  10/04/1982   Mumbai     81
3    Mohit  21/05/1983    Delhi    167
4    Veena  16/08/1984    Delhi    144
5  Shaunak  07/09/1985   Mumbai    135
6    Shaun  11/10/1998  Colombo    111
Data types of columns in original dataframe
Name     object
DOB      object
City     object
Marks     int64
dtype: object
Convert the Data type of a column from string to datetime
Convert the Data type of a column from string in format DD/MM/YYYY to datetime
Modified dataframe :
      Name        DOB     City  Marks
0     jack 1988-10-29   Sydney    155
1     Riti 1981-11-23    Delhi    177
2     Aadi 1982-10-04   Mumbai     81
3    Mohit 1983-05-21    Delhi    167
4    Veena 1984-08-16    Delhi    144
5  Shaunak 1985-07-09   Mumbai    135
6    Shaun 1998-11-10  Colombo    111
Data types of columns in modified dataframe
Name             object
DOB      datetime64[ns]
City             object
Marks             int64
dtype: object
Convert the Data type of a column from string (different ISO8601 formats) to datetime64
Contents of the Original Dataframe : 
    Name                         DOB    City  Marks
0   jack         29-10-1988 11:00 PM  Sydney    155
1   Riti        1981-11-29T13:00:00Z   Delhi    177
2  Mohit  21/05/1983 21:00:19.678908   Delhi    167
Data types of columns in original dataframe
Name     object
DOB      object
City     object
Marks     int64
dtype: object
Modified dataframe :
    Name                        DOB    City  Marks
0   jack 1988-10-29 23:00:00.000000  Sydney    155
1   Riti 1981-11-29 13:00:00.000000   Delhi    177
2  Mohit 1983-05-21 21:00:19.678908   Delhi    167
Data types of columns in modified dataframe
Name             object
DOB      datetime64[ns]
City             object
Marks             int64
dtype: object
--Convert the Data type of a column from custom format string to datetime64
Contents of the Dataframe : 
      Name       DOB     City  Marks
0  Shaunak  07091985   Mumbai    135
1     Riti  11101998  Colombo    111
Modified dataframe :
      Name        DOB     City  Marks
0  Shaunak 1985-07-09   Mumbai    135
1     Riti 1998-11-10  Colombo    111
Data types of columns in modified dataframe
Name             object
DOB      datetime64[ns]
City             object
Marks             int64
dtype: object
--Convert the Data type of a column from string to datetime by extracting date & time strings from big string
Example 1 : extract dates from string and convert the column type to datetime64
Contents of the Dataframe : 
      Name                        DOB     City  Marks
0  Shaunak  date of birth is 07091985   Mumbai    135
1     Riti            11101998 is DOB  Colombo    111
Modified dataframe :
      Name        DOB     City  Marks
0  Shaunak 1985-07-09   Mumbai    135
1     Riti 1998-11-10  Colombo    111
Data types of columns in modified dataframe
Name             object
DOB      datetime64[ns]
City             object
Marks             int64
dtype: object
Example : extract date & time from string and add new columns of datetime64 format
Contents of the Dataframe : 
      Name                            DOB     City  Marks
0  Shaunak  11:00 PM on the date 07091985   Mumbai    135
1     Riti  11101998 and morining 8:00 AM  Colombo    111
Modified dataframe :
      Name                            DOB     City  Marks            DOB_time   DOB_date
0  Shaunak  11:00 PM on the date 07091985  Mumbai   135   1900-01-01 11:00:00 1985-07-09
1  Riti     11101998 and morining 8:00 AM  Colombo  111   1900-01-01 08:00:00 1998-11-10
Data types of columns in modified dataframe
Name        object        
DOB         object        
City        object        
Marks       int64         
DOB_time    datetime64[ns]
DOB_date    datetime64[ns]
dtype: object
--Handle error while Converting the Data type of a column from string to datetime
Contents of the Dataframe : 
      Name       DOB     City  Marks
0  Shaunak  07091985  Mumbai   135  
1  Riti     11101998  Colombo  111  
Data types of columns in original dataframe
Name     object
DOB      object
City     object
Marks    int64 
dtype: object
Contents of the Dataframe : 
      Name       DOB     City  Marks
0  Shaunak  07091985  Mumbai   135  
1  Riti     11101998  Colombo  111  
Data types of columns in modified dataframe
Name     object
DOB      object
City     object
Marks    int64 
dtype: object

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top