How to convert dtype ‘object’ to int in Pandas?

In this article, we will discuss multiple ways to convert any column with ‘object’ dtype to an integer in pandas.

Table of Content

Preparing dataset

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd

# List of Tuples
employees = [('Shubham', 'India', 'Tech', "5", 4),
            ('Riti', 'India', 'Design' , "7", 7),
            ('Shanky', 'India', 'PMO' , "2", 2),
            ('Shreya', 'India', 'Design' , "2", 0),
            ('Aadi', 'US', 'PMO', "11", 5),
            ('Sim', 'US', 'Tech', "4", 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience', 'RelevantExperience'],
                  index = ['A', 'B', 'C', 'D', 'E', 'F'])
print(df)

Contents of the created dataframe are,

      Name Location    Team  Experience  RelevantExperience
A  Shubham    India    Tech           5                   4
B     Riti    India  Design           7                   7
C   Shanky    India     PMO           2                   2
D   Shreya    India  Design           2                   0
E     Aadi       US     PMO          11                   5
F      Sim       US    Tech           4                   4

Also, let’s check the dtypes of the columns

Advertisements
df.info()

Output

<class 'pandas.core.frame.DataFrame'>
Index: 6 entries, A to F
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Name                6 non-null      object
 1   Location            6 non-null      object
 2   Team                6 non-null      object
 3   Experience          6 non-null      object
 4   RelevantExperience  6 non-null      int64 
dtypes: int64(1), object(4)
memory usage: 288.0+ bytes

As observed, the column “Experience” is stored as “object” dtype. So, we will convert it to the int dtype using the methods below.

Approach 1: Using astype() function

This is the simplest method and property of any pandas Series to convert any dtype using the “astype()” function. Let’s understand by converting the column “Experience” to an integer.

# convert dtype of column to "int"
df['Experience'] = df['Experience'].astype(str).astype(int)

print(df['Experience'])

Output

A     5
B     7
C     2
D     2
E    11
F     4
Name: Experience, dtype: int64

As observed, we have converted the dtype from “object” to “int” for the “Experience” column. We can save the output back in the “Experience” column for further use.

Approach 2: Using convert_dtypes() method

The convert_dtypes() method automatically understands the data type of any column based on the values stored and converts them to the suitable dtype. Let’s again try to convert the column “Experience” to integer dtype.

# convert dtype of columns
df.convert_dtypes().info()

Output

<class 'pandas.core.frame.DataFrame'>
Index: 6 entries, A to F
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Name                6 non-null      string
 1   Location            6 non-null      string
 2   Team                6 non-null      string
 3   Experience          6 non-null      Int64 
 4   RelevantExperience  6 non-null      Int64 
dtypes: Int64(2), string(3)
memory usage: 300.0+ bytes

As observed, we passed the entire DataFrame and it converted all the first three columns as “string” and the “Experience” column as integers based on the type of values stored in each column.

Approach 3: Using pandas.to_numeric() function

Another way is to use pandas.to_numeric function to convert any column into numeric dtype. Let’s experiment with the “Experience” column again.

# convert dtype of column to numeric
df['Experience'] = pd.to_numeric(df['Experience'], errors='coerce')


print(df['Experience'])

Output

A     5
B     7
C     2
D     2
E    11
F     4
Name: Experience, dtype: int64

We have similar output as the first method. Here, the errors=”coerce” attribute means that in case of any errors (for example – converting “4.0” into int is not possible), it will return NaN instead of throwing an error.

Summary

In this article, we have discussed how to convert dtype ‘object’ to int in Pandas. Thanks.

Advertisements

Thanks for reading.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top