In this article we will discuss how to convert data type of a dataframe column from string to datetime. The data can be in custom string formats or embedded in big text. Also, how to handle error while converting the data type.
Python’s Pandas module provides a function to convert a given argument to datetime i.e.
pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)
Important parameters:
- arg : Element to be converted to datetime type like int, float, string, datetime, list, tuple, 1-d array or Series.
- errors : Way to handle error. It can be : {‘ignore’, ‘raise’, ‘coerce’}, default value is ‘raise’
- ‘raise’: In case of invalid parsing raise an exception
- ‘coerce’: In case of invalid parsing set as NaT
- ‘ignore’: In case of invalid parsing return the input
- format : string, default None
- Like we pass in strftime to parse date & time string in format eg “%d/%m/%Y” etc.
Returns:
Frequently Asked:
It Converts the given value to date time format and return value depends on the input, for example,
- If a series of string is passed then it will return a series of datetime64 type.
- If a scalar entity is passed then it returns a datetime64 object.
As this function can covert the data type of a series from string to datetime. Let’s see how to use this to convert data type of a column from string to datetime.
Convert the Data type of a column from string to datetime64
Suppose we have a dataframe in which column ‘DOB’ contains the dates in string format ‘DD/MM/YYYY’ i.e.
# List of Tuples empoyees = [('jack', '29/10/1988', 'Sydney', 155) , ('Riti', '23/11/1981', 'Delhi' , 177) , ('Aadi', '10/04/1982', 'Mumbai', 81) , ('Mohit', '21/05/1983','Delhi' , 167) , ('Veena', '16/08/1984', 'Delhi' , 144) , ('Shaunak', '07/09/1985', 'Mumbai', 135 ), ('Shaun', '11/10/1998', 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print(empDfObj)
Contents of the dataframe empDfObj is as follows,
Latest Python - Video Tutorial
Name DOB City Marks 0 jack 29/10/1988 Sydney 155 1 Riti 23/11/1981 Delhi 177 2 Aadi 10/04/1982 Mumbai 81 3 Mohit 21/05/1983 Delhi 167 4 Veena 16/08/1984 Delhi 144 5 Shaunak 07/09/1985 Mumbai 135 6 Shaun 11/10/1998 Colombo 111
To check the data types of columns use attribute Dataframe.dtypes i.e.
print(empDfObj.dtypes)
Output:
Name object DOB object City object Marks int64 dtype: object
Data type of column ‘DOB’ is string, basically it contains the date of births as string but in DD/MM/YYYY format. Now to convert the data type of column ‘DOB’ to datetime64 we will use pandas.to_datetime() i.e.
# Convert the data type of column 'DOB' from string (DD/MM/YYYY) to datetime64 empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'])
Contents of the updated dataframe are,
Name DOB City Marks 0 jack 1988-10-29 Sydney 155 1 Riti 1981-11-23 Delhi 177 2 Aadi 1982-10-04 Mumbai 81 3 Mohit 1983-05-21 Delhi 167 4 Veena 1984-08-16 Delhi 144 5 Shaunak 1985-07-09 Mumbai 135 6 Shaun 1998-11-10 Colombo 111
Lets check the data types of columns in updated dataframe,
print(empDfObj.dtypes)
Output:
Name object DOB datetime64[ns] City object Marks int64 dtype: object
Now the data type of column ‘DOB’ is datetime64.
pd.to_datetime() converts the date time strings in ISO8601 format to datetime64 type. Strings type that it can automatically handles are,
‘DD-MM-YYYY HH:MM AM/PM’
‘YYYY-MM-DDTHH:MM:SS’
‘YYYY-MM-DDT HH:MM:SS.ssssss’
etc.
Let’s see an example where we have dates as string type but in different ISO8601 format i.e.
# List of Tuples empoyees = [('jack', '29-10-1988 11:00 PM', 'Sydney', 155) , ('Riti', '1981-11-29T13:00:00Z', 'Delhi' , 177) , ('Mohit', '21/05/1983 21:00:19.678908','Delhi' , 167) , ] # Create a DataFrame object with column DOB containing date time strings in different formats empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print(empDfObj)
Contents of the dataframe are,
Name DOB City Marks 0 jack 29-10-1988 11:00 PM Sydney 155 1 Riti 1981-11-29T13:00:00Z Delhi 177 2 Mohit 21/05/1983 21:00:19.678908 Delhi 167
Let’s check the data types of columns i.e.
print(empDfObj.dtypes)
Output:
Name object DOB object City object Marks int64 dtype: object
Datatype of column ‘DOB’ is string but it is in different string format. Let’s convert the data type of column ‘DOB’ to datetime64 i.e.
# Convert the data type of column 'DOB' from string with different ISO8601 formats to datetime64 empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'])
Lets check the data types of columns in updated dataframe,
print(empDfObj.dtypes)
Output:
Name object DOB datetime64[ns] City object Marks int64 dtype: object
Now the data type of column ‘DOB’ is datetime64.
Convert the Data type of a column from custom format string to datetime64
There might be cases when our dataframe have columns which contains date & time column in custom formats like,
DDMMYYYY
DD–MM–YY
To convert data type of column from these custom strings formats to datetime, we need to pass the format argument in pd.to_datetime(). Let’s see an example where we have column in dataframe which contains dates in custom string format i.e. DDMMYYYY,
# List of Tuples empoyees = [('Shaunak', '07091985', 'Mumbai', 135), ('Riti', '11101998', 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print(empDfObj)
Contents of the dataframe are,
Name DOB City Marks 0 Shaunak 07091985 Mumbai 135 1 Riti 11101998 Colombo 111
Let’s check the data types of columns i.e.
print(empDfObj.dtypes)
Output:
Name object DOB object City object Marks int64 dtype: object
Datatype of column ‘DOB‘ is string. Let’s convert the data type of column ‘DOB’ to datetime64 i.e.
# Convert the data type of column 'DOB' from string of custom formats to datetime64 empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y')
Lets check the data types of columns in updated dataframe,
print(empDfObj.dtypes)
Output:
Name object DOB datetime64[ns] City object Marks int64 dtype: object
Now the data type of column ‘DOB’ is datetime64. Here we passed the date time string format in format argument of pd.to_datetime().
Convert the Data type of a column from string to datetime by extracting date & time strings from big string
There might be scenarios when our column in dataframe contains some text and we need to fetch date & time from those texts like,
- date of birth is 07091985
- 11101998 is DOB
To fetch datetime from this big text we need to pass exact argument in pd.to_dataframe(), if passed as False it will try to match the format anywhere in string. Let’s understand by an example,
Suppose we have a dataframe with a column DOB, that contains text in which date time is at different place like,
# List of Tuples empoyees = [('Shaunak', 'date of birth is 07091985', 'Mumbai', 135), ('Riti', '11101998 is DOB', 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print("Contents of the Dataframe : ") print(empDfObj)
Contents of the dataframe are,
Contents of the Dataframe : Name DOB City Marks 0 Shaunak date of birth is 07091985 Mumbai 135 1 Riti 11101998 is DOB Colombo 111
Let’s check the data types of columns i.e.
print(empDfObj.dtypes)
Output:
Name object DOB object City object Marks int64
Datatype of column ‘DOB’ is string. Let’s convert the contents of column DOB by removing extra text and making the data type of column ‘DOB’ to datetime64 i.e.
# extract the date & time from string in DOB column and convert type of column to datetime64 empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y', exact=False)
Lets check the data types of columns in updated dataframe,
print(empDfObj)
Output:
Name DOB City Marks 0 Shaunak 1985-07-09 Mumbai 135 1 Riti 1998-11-10 Colombo 111
Now the data type of column ‘DOB’ is datetime64 and extra text is also removed.
Another Example : Extract date & time from big string in a column and add new columns of datetime64 format
Suppose we have a column in dataframe that contains big text and inside the text it contains date and time at different places i.e.
# List of Tuples empoyees = [('Shaunak', '11:00 PM on the date 07091985', 'Mumbai', 135), ('Riti', '11101998 and morining 8:00 AM', 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print("Contents of the Dataframe : ") print(empDfObj)
Contents of the dataframe are,
Name DOB City Marks 0 Shaunak 11:00 PM on the date 07091985 Mumbai 135 1 Riti 11101998 and morining 8:00 AM Colombo 111
Datatype of column ‘DOB’ is string / object. Let’s add new columns in dataframe that contains date and time from this big text i.e.
empDfObj['DOB_time'] = pd.to_datetime(empDfObj['DOB'], format='%H:%M %p', exact=False) empDfObj['DOB_date'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y', exact=False) print('Modified dataframe :') print(empDfObj)
Output
Modified dataframe : Name DOB City Marks DOB_time DOB_date 0 Shaunak 11:00 PM on the date 07091985 Mumbai 135 1900-01-01 11:00:00 1985-07-09 1 Riti 11101998 and morining 8:00 AM Colombo 111 1900-01-01 08:00:00 1998-11-10
For DOB_time column we provided time only, therefore it picked the default date i.e. 1900-01-01. Where as DOB_date contains the date only. But the data type of both DOB_time & DOB_date is datetime64.
Handle error while Converting the Data type of a column from string to datetime
To handle errors while converting data type of a column we can pass the errors arguments to customize the behavior i.e.
- ‘raise’: In case of invalid parsing raise an exception
- ‘coerce’: In case of invalid parsing set as NaT
- ‘ignore’: In case of invalid parsing return the input
Let’s see an example of ignoring errors while converting the type of column from string to datetime
# List of Tuples empoyees = [('Shaunak', '07091985', 'Mumbai', 135), ('Riti', '11101998', 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print("Contents of the Dataframe : ") print(empDfObj) print('Data types of columns in original dataframe') print(empDfObj.dtypes) # Ignore errors while converting the type of column from string to datetime empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], errors='ignore') print("Contents of the Dataframe : ") print(empDfObj) print('Data types of columns in modified dataframe') print(empDfObj.dtypes)
Output
Contents of the Dataframe : Name DOB City Marks 0 Shaunak 07091985 Mumbai 135 1 Riti 11101998 Colombo 111 Data types of columns in original dataframe Name object DOB object City object Marks int64 dtype: object Contents of the Dataframe : Name DOB City Marks 0 Shaunak 07091985 Mumbai 135 1 Riti 11101998 Colombo 111 Data types of columns in modified dataframe Name object DOB object City object Marks int64 dtype: object
Complete example is as follows,
import pandas as pd def main(): # List of Tuples empoyees = [('jack', '29/10/1988', 'Sydney', 155) , ('Riti', '23/11/1981', 'Delhi' , 177) , ('Aadi', '10/04/1982', 'Mumbai', 81) , ('Mohit', '21/05/1983','Delhi' , 167) , ('Veena', '16/08/1984', 'Delhi' , 144) , ('Shaunak', '07/09/1985', 'Mumbai', 135 ), ('Shaun', '11/10/1998', 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print("Contents of the Original Dataframe : ") print(empDfObj) print('Data types of columns in original dataframe') print(empDfObj.dtypes) print('Convert the Data type of a column from string to datetime') print('Convert the Data type of a column from string in format DD/MM/YYYY to datetime') # Convert the data type of column 'DOB' from string (DD/MM/YYYY) to datetime64 empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB']) print('Modified dataframe :') print(empDfObj) print('Data types of columns in modified dataframe') print(empDfObj.dtypes) print('Convert the Data type of a column from string (different ISO8601 formats) to datetime64') # List of Tuples empoyees = [('jack', '29-10-1988 11:00 PM', 'Sydney', 155) , ('Riti', '1981-11-29T13:00:00Z', 'Delhi' , 177) , ('Mohit', '21/05/1983 21:00:19.678908','Delhi' , 167) , ] # Create a DataFrame object with column DOB containing date time strings in different formats empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print("Contents of the Original Dataframe : ") print(empDfObj) print('Data types of columns in original dataframe') print(empDfObj.dtypes) # Convert the data type of column 'DOB' from string with different ISO8601 formats to datetime64 empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB']) print('Modified dataframe :') print(empDfObj) print('Data types of columns in modified dataframe') print(empDfObj.dtypes) print('--Convert the Data type of a column from custom format string to datetime64') # List of Tuples empoyees = [('Shaunak', '07091985', 'Mumbai', 135), ('Riti', '11101998', 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print("Contents of the Dataframe : ") print(empDfObj) # Convert the data type of column 'DOB' from string of custom formats to datetime64 empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y') print('Modified dataframe :') print(empDfObj) print('Data types of columns in modified dataframe') print(empDfObj.dtypes) print('--Convert the Data type of a column from string to datetime by extracting date & time strings from big string') print('Example 1 : extract dates from string and convert the column type to datetime64') # List of Tuples empoyees = [('Shaunak', 'date of birth is 07091985', 'Mumbai', 135), ('Riti', '11101998 is DOB', 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print("Contents of the Dataframe : ") print(empDfObj) # extract the date & time from string in DOB column and convert type of column to datetime64 empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y', exact=False) print('Modified dataframe :') print(empDfObj) print('Data types of columns in modified dataframe') print(empDfObj.dtypes) print('Example : extract date & time from string and add new columns of datetime64 format') # List of Tuples empoyees = [('Shaunak', '11:00 PM on the date 07091985', 'Mumbai', 135), ('Riti', '11101998 and morining 8:00 AM', 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print("Contents of the Dataframe : ") print(empDfObj) empDfObj['DOB_time'] = pd.to_datetime(empDfObj['DOB'], format='%H:%M %p', exact=False) empDfObj['DOB_date'] = pd.to_datetime(empDfObj['DOB'], format='%m%d%Y', exact=False) print('Modified dataframe :') print(empDfObj) print('Data types of columns in modified dataframe') print(empDfObj.dtypes) print('--Handle error while Converting the Data type of a column from string to datetime') # List of Tuples empoyees = [('Shaunak', '07091985', 'Mumbai', 135), ('Riti', '11101998', 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'DOB', 'City', 'Marks']) print("Contents of the Dataframe : ") print(empDfObj) print('Data types of columns in original dataframe') print(empDfObj.dtypes) # Ignore errors while converting the type of column from string to datetime empDfObj['DOB'] = pd.to_datetime(empDfObj['DOB'], errors='ignore') print("Contents of the Dataframe : ") print(empDfObj) print('Data types of columns in modified dataframe') print(empDfObj.dtypes) if __name__ == '__main__': main()
Output:
Contents of the Original Dataframe : Name DOB City Marks 0 jack 29/10/1988 Sydney 155 1 Riti 23/11/1981 Delhi 177 2 Aadi 10/04/1982 Mumbai 81 3 Mohit 21/05/1983 Delhi 167 4 Veena 16/08/1984 Delhi 144 5 Shaunak 07/09/1985 Mumbai 135 6 Shaun 11/10/1998 Colombo 111 Data types of columns in original dataframe Name object DOB object City object Marks int64 dtype: object Convert the Data type of a column from string to datetime Convert the Data type of a column from string in format DD/MM/YYYY to datetime Modified dataframe : Name DOB City Marks 0 jack 1988-10-29 Sydney 155 1 Riti 1981-11-23 Delhi 177 2 Aadi 1982-10-04 Mumbai 81 3 Mohit 1983-05-21 Delhi 167 4 Veena 1984-08-16 Delhi 144 5 Shaunak 1985-07-09 Mumbai 135 6 Shaun 1998-11-10 Colombo 111 Data types of columns in modified dataframe Name object DOB datetime64[ns] City object Marks int64 dtype: object Convert the Data type of a column from string (different ISO8601 formats) to datetime64 Contents of the Original Dataframe : Name DOB City Marks 0 jack 29-10-1988 11:00 PM Sydney 155 1 Riti 1981-11-29T13:00:00Z Delhi 177 2 Mohit 21/05/1983 21:00:19.678908 Delhi 167 Data types of columns in original dataframe Name object DOB object City object Marks int64 dtype: object Modified dataframe : Name DOB City Marks 0 jack 1988-10-29 23:00:00.000000 Sydney 155 1 Riti 1981-11-29 13:00:00.000000 Delhi 177 2 Mohit 1983-05-21 21:00:19.678908 Delhi 167 Data types of columns in modified dataframe Name object DOB datetime64[ns] City object Marks int64 dtype: object --Convert the Data type of a column from custom format string to datetime64 Contents of the Dataframe : Name DOB City Marks 0 Shaunak 07091985 Mumbai 135 1 Riti 11101998 Colombo 111 Modified dataframe : Name DOB City Marks 0 Shaunak 1985-07-09 Mumbai 135 1 Riti 1998-11-10 Colombo 111 Data types of columns in modified dataframe Name object DOB datetime64[ns] City object Marks int64 dtype: object --Convert the Data type of a column from string to datetime by extracting date & time strings from big string Example 1 : extract dates from string and convert the column type to datetime64 Contents of the Dataframe : Name DOB City Marks 0 Shaunak date of birth is 07091985 Mumbai 135 1 Riti 11101998 is DOB Colombo 111 Modified dataframe : Name DOB City Marks 0 Shaunak 1985-07-09 Mumbai 135 1 Riti 1998-11-10 Colombo 111 Data types of columns in modified dataframe Name object DOB datetime64[ns] City object Marks int64 dtype: object Example : extract date & time from string and add new columns of datetime64 format Contents of the Dataframe : Name DOB City Marks 0 Shaunak 11:00 PM on the date 07091985 Mumbai 135 1 Riti 11101998 and morining 8:00 AM Colombo 111 Modified dataframe : Name DOB City Marks DOB_time DOB_date 0 Shaunak 11:00 PM on the date 07091985 Mumbai 135 1900-01-01 11:00:00 1985-07-09 1 Riti 11101998 and morining 8:00 AM Colombo 111 1900-01-01 08:00:00 1998-11-10 Data types of columns in modified dataframe Name object DOB object City object Marks int64 DOB_time datetime64[ns] DOB_date datetime64[ns] dtype: object --Handle error while Converting the Data type of a column from string to datetime Contents of the Dataframe : Name DOB City Marks 0 Shaunak 07091985 Mumbai 135 1 Riti 11101998 Colombo 111 Data types of columns in original dataframe Name object DOB object City object Marks int64 dtype: object Contents of the Dataframe : Name DOB City Marks 0 Shaunak 07091985 Mumbai 135 1 Riti 11101998 Colombo 111 Data types of columns in modified dataframe Name object DOB object City object Marks int64 dtype: object
Latest Video Tutorials