Change Data Type of a Single Column
series.astype()
Series.astype(self, dtype, copy=True, errors='raise', **kwargs)
Arguments:
- dtype : A python type to which type of whole series object will be converted to.
- errors : Way to handle error. It can be : {ignore, raise}, default value is raise
- raise: In case of invalid parsing raise an exception
- ignore: In case of invalid parsing return the input as it original
- copy : bool. Default value is True.
- If False : Make changes in current object
- If True : Return a copy
Returns:
- If copy argument is True then returns a new Series object with updated type.
Now let’s see how to use this function to change the data type of a column in our dataframe.
Import pandas module as pd i.e.
Frequently Asked:
import pandas as pd
# List of Tuples empoyees = [('jack', 34, 'Sydney', 155) , ('Riti', 31, 'Delhi' , 177) , ('Aadi', 16, 'Mumbai', 81) , ('Mohit', 31,'Delhi' , 167) , ('Veena', 12, 'Delhi' , 144) , ('Shaunak', 35, 'Mumbai', 135 ), ('Shaun', 35, 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks']) print(empDfObj)
Contents of the dataframe are,
Name Age City Marks 0 jack 34 Sydney 155 1 Riti 31 Delhi 177 2 Aadi 16 Mumbai 81 3 Mohit 31 Delhi 167 4 Veena 12 Delhi 144 5 Shaunak 35 Mumbai 135 6 Shaun 35 Colombo 111
Now let’s check the datatype of columns in the above created dataframe,
print(empDfObj.dtypes)
Output:
Name object Age int64 City object Marks int64 dtype: object
Change data type of a column from int64 to float64
# Change data type of column 'Marks' from int64 to float64 empDfObj['Marks'] = empDfObj['Marks'].astype('float64')
By default astype() returns a copy of passed series with changed data type. We assigned this new series back to empDfObj[‘Marks’].
Latest Python - Video Tutorial
print(empDfObj.dtypes)
Output:
Name object Age int64 City object Marks float64 dtype: object
Now data type of column ‘Marks’ is float64. It will also be reflected in the contents of dataframe i.e.
print(empDfObj)
Output:
Name Age City Marks 0 jack 34 Sydney 155.0 1 Riti 31 Delhi 177.0 2 Aadi 16 Mumbai 81.0 3 Mohit 31 Delhi 167.0 4 Veena 12 Delhi 144.0 5 Shaunak 35 Mumbai 135.0 6 Shaun 35 Colombo 111.0
In ‘Marks’ column values are in float now.
Change data type of a column from int64 to string
# Change data type of column 'Age' from int64 to string i.e. object type empDfObj['Age'] = empDfObj['Age'].astype('object')
As default value of copy argument in astype() was True. Therefore, it returns a copy of passed series with changed data type. We assigned this new series back to empDfObj[‘Age’].
print(empDfObj.dtypes)
Output:
Name object Age object City object Marks float64 dtype: object
Now data type of column ‘Age’ is object.
Change Data Type of Multiple Columns in Dataframe
DataFrame.astype()
DataFrame.astype(self, dtype, copy=True, errors='raise', **kwargs)
Arguments:
- dtype : A python type to which type of whole dataframe will be converted to.
-
- Dictionary of column names and data types. On given columns will be converted to corresponding types.
-
- errors : Way to handle error. It can be : {ignore, raise}, default value is raise
-
- raise: In case of invalid parsing raise an exception
- ignore: In case of invalid parsing return the input as it original
-
- copy : bool. Default value is True.
-
- If False : Make changes in current object
- If True : Return a copy
-
Returns
- If copy argument is True then returns a new Dataframe object with updated type of given columns.
# List of Tuples empoyees = [('jack', 34, 'Sydney', 155) , ('Riti', 31, 'Delhi' , 177) , ('Aadi', 16, 'Mumbai', 81) , ('Mohit', 31,'Delhi' , 167) , ('Veena', 12, 'Delhi' , 144) , ('Shaunak', 35, 'Mumbai', 135 ), ('Shaun', 35, 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks']) print(empDfObj)
Contents of the dataframe are,
Name Age City Marks 0 jack 34 Sydney 155 1 Riti 31 Delhi 177 2 Aadi 16 Mumbai 81 3 Mohit 31 Delhi 167 4 Veena 12 Delhi 144 5 Shaunak 35 Mumbai 135 6 Shaun 35 Colombo 111
Now let’s check the datatype of columns in the above created dataframe,
print(empDfObj.dtypes)
Output:
Name object Age int64 City object Marks int64 dtype: object
Now to convert the data type of 2 columns i.e. ‘Age’ & ‘Marks’ from int64 to float64 & string respectively, we can pass a dictionary to the Dataframe.astype(). This dictionary contains the column names as keys and thier new data types as values i.e.
# Convert the data type of column Age to float64 & data type of column Marks to string empDfObj = empDfObj.astype({'Age': 'float64', 'Marks': 'object'})
As default value of copy argument in Dataframe.astype() was True. Therefore, it returns a copy of passed Dataframe with changed data types of given columns. We assigned this new series back to empDfObj.
Now check the data type of dataframe’s columns again i.e.
print(empDfObj.dtypes)
Output:
Name object Age float64 City object Marks object dtype: object
Now the new data types of column ‘Age’ is float64 and ‘Marks’ is string.
print(empDfObj)
Output:
Name Age City Marks 0 jack 34.0 Sydney 155 1 Riti 31.0 Delhi 177 2 Aadi 16.0 Mumbai 81 3 Mohit 31.0 Delhi 167 4 Veena 12.0 Delhi 144 5 Shaunak 35.0 Mumbai 135 6 Shaun 35.0 Colombo 111
Handle errors while converting Data Types of Columns
try: empDfObj['Age'] = empDfObj['Age'].astype('abc') except TypeError as e: print(e)
Output:
data type "abc" not understood
As there is no data type ‘abc’, therefore if we try to convert the data type of a column to something that is not possible then it will though error TypeError and program will crash. To handle this kind of fatal error use try / except.
Complete example is as follows,
import pandas as pd def main(): # List of Tuples empoyees = [('jack', 34, 'Sydney', 155) , ('Riti', 31, 'Delhi' , 177) , ('Aadi', 16, 'Mumbai', 81) , ('Mohit', 31,'Delhi' , 167) , ('Veena', 12, 'Delhi' , 144) , ('Shaunak', 35, 'Mumbai', 135 ), ('Shaun', 35, 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks']) print("Contents of the Dataframe : ") print(empDfObj) print('Data type of each column :') print(empDfObj.dtypes) print('*** Change Data Type of a Column ***') print('Change data type of a column from int64 to float64') # Change data type of column 'Marks' from int64 to float64 empDfObj['Marks'] = empDfObj['Marks'].astype('float64') print("Updated Contents of the Dataframe : ") print(empDfObj) print('Data types of columns :') print(empDfObj.dtypes) print('Change data type of a column from int64 to string') # Change data type of column 'Age' from int64 to string i.e. object type empDfObj['Age'] = empDfObj['Age'].astype('object') print("Updated Contents of the Dataframe : ") print(empDfObj) print('Data types of columns :') print(empDfObj.dtypes) print('*** Change Data Type of multiple Column ***') # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks']) print("Contents of Original Dataframe : ") print(empDfObj) print('Data type of each column in Original dataframe :') print(empDfObj.dtypes) # Convert the data type of column Age to float64 & data type of column Marks to string empDfObj = empDfObj.astype({'Age': 'float64', 'Marks': 'object'}) print("Updated Contents of the Dataframe : ") print(empDfObj) print('Data types of columns :') print(empDfObj.dtypes) print('*** Handle errors while converting Data Type Column ***') try: empDfObj['Age'] = empDfObj['Age'].astype('abc') except TypeError as e: print(e) if __name__ == '__main__': main()
Output:
Contents of the Dataframe : Name Age City Marks 0 jack 34 Sydney 155 1 Riti 31 Delhi 177 2 Aadi 16 Mumbai 81 3 Mohit 31 Delhi 167 4 Veena 12 Delhi 144 5 Shaunak 35 Mumbai 135 6 Shaun 35 Colombo 111 Data type of each column : Name object Age int64 City object Marks int64 dtype: object *** Change Data Type of a Column *** Change data type of a column from int64 to float64 Updated Contents of the Dataframe : Name Age City Marks 0 jack 34 Sydney 155.0 1 Riti 31 Delhi 177.0 2 Aadi 16 Mumbai 81.0 3 Mohit 31 Delhi 167.0 4 Veena 12 Delhi 144.0 5 Shaunak 35 Mumbai 135.0 6 Shaun 35 Colombo 111.0 Data types of columns : Name object Age int64 City object Marks float64 dtype: object Change data type of a column from int64 to string Updated Contents of the Dataframe : Name Age City Marks 0 jack 34 Sydney 155.0 1 Riti 31 Delhi 177.0 2 Aadi 16 Mumbai 81.0 3 Mohit 31 Delhi 167.0 4 Veena 12 Delhi 144.0 5 Shaunak 35 Mumbai 135.0 6 Shaun 35 Colombo 111.0 Data types of columns : Name object Age object City object Marks float64 dtype: object *** Change Data Type of multiple Column *** Contents of Original Dataframe : Name Age City Marks 0 jack 34 Sydney 155 1 Riti 31 Delhi 177 2 Aadi 16 Mumbai 81 3 Mohit 31 Delhi 167 4 Veena 12 Delhi 144 5 Shaunak 35 Mumbai 135 6 Shaun 35 Colombo 111 Data type of each column in Original dataframe : Name object Age int64 City object Marks int64 dtype: object Updated Contents of the Dataframe : Name Age City Marks 0 jack 34.0 Sydney 155 1 Riti 31.0 Delhi 177 2 Aadi 16.0 Mumbai 81 3 Mohit 31.0 Delhi 167 4 Veena 12.0 Delhi 144 5 Shaunak 35.0 Mumbai 135 6 Shaun 35.0 Colombo 111 Data types of columns : Name object Age float64 City object Marks object dtype: object *** Handle errors while converting Data Type Column *** data type "abc" not understood
Latest Video Tutorials