Pandas : Change data type of single or multiple columns of Dataframe in Python

In this article we will discuss how to change the data type of a single column or multiple columns of a Dataframe in Python.

Change Data Type of a Single Column

To change the data type of a single column in dataframe, we are going to use a function series.astype(). Let’s first discuss about this function,

series.astype()

In Python’s Pandas module Series class provides a member function to the change type of a Series object i.e.
Series.astype(self, dtype, copy=True, errors='raise', **kwargs)

Arguments:

  • dtype : A python type to which type of whole series object will be converted to.
  • errors : Way to handle error. It can be : {ignore, raise}, default value is raise
    • raise: In case of invalid parsing raise an exception
    • ignore: In case of invalid parsing return the input as it original
  • copy : bool. Default value is True.
    • If False : Make changes in current object
    • If True : Return a copy

Returns:

  • If copy argument is True then returns a new Series object with updated type.

Now let’s see how to use this function to change the data type of a column in our dataframe.

Import pandas module as pd i.e.

import pandas as pd
First of all we will create a Dataframe with different data type of columns  i.e.
# List of Tuples
empoyees = [('jack', 34, 'Sydney', 155) ,
        ('Riti', 31, 'Delhi' , 177) ,
        ('Aadi', 16, 'Mumbai', 81) ,
        ('Mohit', 31,'Delhi' , 167) ,
        ('Veena', 12, 'Delhi' , 144) ,
        ('Shaunak', 35, 'Mumbai', 135 ),
        ('Shaun', 35, 'Colombo', 111)
        ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks'])
print(empDfObj)

Contents of the dataframe are,

      Name  Age     City  Marks
0     jack   34   Sydney    155
1     Riti   31    Delhi    177
2     Aadi   16   Mumbai     81
3    Mohit   31    Delhi    167
4    Veena   12    Delhi    144
5  Shaunak   35   Mumbai    135
6    Shaun   35  Colombo    111

Now let’s check the datatype of columns in the above created dataframe,

print(empDfObj.dtypes)

Output:

Name     object
Age       int64
City     object
Marks     int64
dtype: object

Change data type of a column from int64 to float64

As we can see that data type of column ‘Marks’ is int64. Let’s change the data type of column ‘Marks’ to float64 i.e.
# Change data type of column 'Marks' from int64 to float64
empDfObj['Marks'] = empDfObj['Marks'].astype('float64')

By default astype() returns a copy of passed series with changed data type. We assigned this new series back to empDfObj[‘Marks’].

Now check the data type of dataframe’s columns again i.e.
print(empDfObj.dtypes)

Output:

Name      object
Age        int64
City      object
Marks    float64
dtype: object

Now data type of column ‘Marks’ is float64. It will also be reflected in the contents of dataframe i.e.

print(empDfObj)

Output:

      Name  Age     City  Marks
0     jack   34   Sydney  155.0
1     Riti   31    Delhi  177.0
2     Aadi   16   Mumbai   81.0
3    Mohit   31    Delhi  167.0
4    Veena   12    Delhi  144.0
5  Shaunak   35   Mumbai  135.0
6    Shaun   35  Colombo  111.0

In ‘Marks’ column values are in float now.

Let’s see an another example,

Change data type of a column from int64 to string

Data type of column ‘Age’ is int64. Let’s change the data type of column ‘Age’ to string i.e. object type
# Change data type of column 'Age' from int64 to string i.e. object type
empDfObj['Age'] = empDfObj['Age'].astype('object')

As default value of copy argument in astype() was True. Therefore, it returns a copy of passed series with changed data type. We assigned this new series back to empDfObj[‘Age’].

Now check the data type of dataframe’s columns again i.e.
print(empDfObj.dtypes)

Output:

Name      object
Age       object
City      object
Marks    float64
dtype: object

Now data type of column ‘Age’ is object.

This is how we can change the data type of a single column in dataframe. Now let’s see how to change types of multiple columns in a single line.

Change Data Type of Multiple Columns in Dataframe

To change the data type of multiple columns in the dataframe we are going to use DataFrame.astype().

DataFrame.astype()

It can either cast the whole dataframe to a new data type or selected columns to given data types.
DataFrame.astype(self, dtype, copy=True, errors='raise', **kwargs)

Arguments:

  • dtype : A python type to which type of whole dataframe will be converted to.
      • Dictionary of column names and data types. On given columns will be converted to corresponding types.
  • errors : Way to handle error. It can be : {ignore, raise}, default value is raise
      • raise: In case of invalid parsing raise an exception
      • ignore: In case of invalid parsing return the input as it original
  • copy : bool. Default value is True.
      • If False : Make changes in current object
      • If True : Return a copy

Returns

  • If copy argument is True then returns a new Dataframe object with updated type of given columns.
Let’s understand this by some examples,
First of all we will create a Dataframe i.e.
# List of Tuples
empoyees = [('jack', 34, 'Sydney', 155) ,
        ('Riti', 31, 'Delhi' , 177) ,
        ('Aadi', 16, 'Mumbai', 81) ,
        ('Mohit', 31,'Delhi' , 167) ,
        ('Veena', 12, 'Delhi' , 144) ,
        ('Shaunak', 35, 'Mumbai', 135 ),
        ('Shaun', 35, 'Colombo', 111)
        ]

# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks'])

print(empDfObj)

Contents of the dataframe are,

      Name  Age     City  Marks
0     jack   34   Sydney    155
1     Riti   31    Delhi    177
2     Aadi   16   Mumbai     81
3    Mohit   31    Delhi    167
4    Veena   12    Delhi    144
5  Shaunak   35   Mumbai    135
6    Shaun   35  Colombo    111

Now let’s check the datatype of columns in the above created dataframe,

print(empDfObj.dtypes)

Output:

Name     object
Age       int64
City     object
Marks     int64
dtype: object

Now to convert the data type of 2 columns i.e. ‘Age’ & ‘Marks’ from int64 to float64 & string respectively, we can pass a dictionary to the Dataframe.astype(). This dictionary contains the column names as keys and thier new data types as values i.e.

# Convert the data type of column Age to float64 & data type of column Marks to string
empDfObj = empDfObj.astype({'Age': 'float64', 'Marks': 'object'})

As default value of copy argument in Dataframe.astype() was True. Therefore, it returns a copy of passed Dataframe with changed data types of given columns. We assigned this new series back to empDfObj.

Now check the data type of dataframe’s columns again i.e.

print(empDfObj.dtypes)

Output:

Name      object
Age      float64
City      object
Marks     object
dtype: object

Now the new data types of column ‘Age’ is float64 and ‘Marks’ is string.

It will be reflected in the contents of the dataframe too i.e.
print(empDfObj)

Output:

      Name   Age     City Marks
0     jack  34.0   Sydney   155
1     Riti  31.0    Delhi   177
2     Aadi  16.0   Mumbai    81
3    Mohit  31.0    Delhi   167
4    Veena  12.0    Delhi   144
5  Shaunak  35.0   Mumbai   135
6    Shaun  35.0  Colombo   111

Handle errors while converting Data Types of Columns

Using Series.astype() or Dataframe.astype() If we pass the type to which content can not be typecasted then it will create error. By default in case of error it will through TypeError.
For example, lets try to convert the type of a column ‘Age’ to ‘abc’. It will raise the error i.e.
try:
        empDfObj['Age'] = empDfObj['Age'].astype('abc')
except TypeError as e:
        print(e)

Output:

data type "abc" not understood

As there is no data type ‘abc’, therefore if we try to convert the data type of a column to something that is not possible then it will though error TypeError and program will crash. To handle this kind of fatal error use try / except.

Complete example is as follows,

import pandas as pd
 
def main():
 
        # List of Tuples
        empoyees = [('jack', 34, 'Sydney', 155) ,
                ('Riti', 31, 'Delhi' , 177) ,
                ('Aadi', 16, 'Mumbai', 81) ,
                ('Mohit', 31,'Delhi' , 167) ,
                ('Veena', 12, 'Delhi' , 144) ,
                ('Shaunak', 35, 'Mumbai', 135 ),
                ('Shaun', 35, 'Colombo', 111)
                ]

        # Create a DataFrame object
        empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks'])

        print("Contents of the Dataframe : ")
        print(empDfObj)

        print('Data type of each column :')
        print(empDfObj.dtypes)

        print('*** Change Data Type of a Column ***')

        print('Change data type of a column from int64 to float64')

        # Change data type of column 'Marks' from int64 to float64
        empDfObj['Marks'] = empDfObj['Marks'].astype('float64')

        print("Updated Contents of the Dataframe : ")
        print(empDfObj)
        print('Data types of columns :')
        print(empDfObj.dtypes)

        print('Change data type of a column from int64 to string')

        # Change data type of column 'Age' from int64 to string i.e. object type
        empDfObj['Age'] = empDfObj['Age'].astype('object')

        print("Updated Contents of the Dataframe : ")
        print(empDfObj)
        print('Data types of columns :')
        print(empDfObj.dtypes)

        print('*** Change Data Type of multiple Column ***')

        # Create a DataFrame object
        empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks'])

        print("Contents of Original Dataframe : ")
        print(empDfObj)

        print('Data type of each column in Original dataframe :')
        print(empDfObj.dtypes)

        # Convert the data type of column Age to float64 & data type of column Marks to string
        empDfObj = empDfObj.astype({'Age': 'float64', 'Marks': 'object'})

        print("Updated Contents of the Dataframe : ")
        print(empDfObj)
        print('Data types of columns :')
        print(empDfObj.dtypes)

        print('*** Handle errors while converting Data Type Column ***')

        try:
                empDfObj['Age'] = empDfObj['Age'].astype('abc')
        except TypeError as e:
                print(e)
 
if __name__ == '__main__':
        main()

Output:

Contents of the Dataframe :
      Name  Age     City  Marks
0     jack   34   Sydney    155
1     Riti   31    Delhi    177
2     Aadi   16   Mumbai     81
3    Mohit   31    Delhi    167
4    Veena   12    Delhi    144
5  Shaunak   35   Mumbai    135
6    Shaun   35  Colombo    111
Data type of each column :
Name     object
Age       int64
City     object
Marks     int64
dtype: object
*** Change Data Type of a Column ***
Change data type of a column from int64 to float64
Updated Contents of the Dataframe :
      Name  Age     City  Marks
0     jack   34   Sydney  155.0
1     Riti   31    Delhi  177.0
2     Aadi   16   Mumbai   81.0
3    Mohit   31    Delhi  167.0
4    Veena   12    Delhi  144.0
5  Shaunak   35   Mumbai  135.0
6    Shaun   35  Colombo  111.0
Data types of columns :
Name      object
Age        int64
City      object
Marks    float64
dtype: object
Change data type of a column from int64 to string
Updated Contents of the Dataframe :
      Name Age     City  Marks
0     jack  34   Sydney  155.0
1     Riti  31    Delhi  177.0
2     Aadi  16   Mumbai   81.0
3    Mohit  31    Delhi  167.0
4    Veena  12    Delhi  144.0
5  Shaunak  35   Mumbai  135.0
6    Shaun  35  Colombo  111.0
Data types of columns :
Name      object
Age       object
City      object
Marks    float64
dtype: object
*** Change Data Type of multiple Column ***
Contents of Original Dataframe :
      Name  Age     City  Marks
0     jack   34   Sydney    155
1     Riti   31    Delhi    177
2     Aadi   16   Mumbai     81
3    Mohit   31    Delhi    167
4    Veena   12    Delhi    144
5  Shaunak   35   Mumbai    135
6    Shaun   35  Colombo    111
Data type of each column in Original dataframe :
Name     object
Age       int64
City     object
Marks     int64
dtype: object
Updated Contents of the Dataframe :
      Name   Age     City Marks
0     jack  34.0   Sydney   155
1     Riti  31.0    Delhi   177
2     Aadi  16.0   Mumbai    81
3    Mohit  31.0    Delhi   167
4    Veena  12.0    Delhi   144
5  Shaunak  35.0   Mumbai   135
6    Shaun  35.0  Colombo   111
Data types of columns :
Name      object
Age      float64
City      object
Marks     object
dtype: object
*** Handle errors while converting Data Type Column ***
data type "abc" not understood

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top