In this article, we will discuss multiple ways to convert any column with ‘object’ dtype to an integer in pandas.
Table of Content
Preparing dataset
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd # List of Tuples employees = [('Shubham', 'India', 'Tech', "5", 4), ('Riti', 'India', 'Design' , "7", 7), ('Shanky', 'India', 'PMO' , "2", 2), ('Shreya', 'India', 'Design' , "2", 0), ('Aadi', 'US', 'PMO', "11", 5), ('Sim', 'US', 'Tech', "4", 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Location', 'Team', 'Experience', 'RelevantExperience'], index = ['A', 'B', 'C', 'D', 'E', 'F']) print(df)
Contents of the created dataframe are,
Name Location Team Experience RelevantExperience A Shubham India Tech 5 4 B Riti India Design 7 7 C Shanky India PMO 2 2 D Shreya India Design 2 0 E Aadi US PMO 11 5 F Sim US Tech 4 4
Also, let’s check the dtypes of the columns
Frequently Asked:
df.info()
Output
<class 'pandas.core.frame.DataFrame'> Index: 6 entries, A to F Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 6 non-null object 1 Location 6 non-null object 2 Team 6 non-null object 3 Experience 6 non-null object 4 RelevantExperience 6 non-null int64 dtypes: int64(1), object(4) memory usage: 288.0+ bytes
As observed, the column “Experience” is stored as “object” dtype. So, we will convert it to the int dtype using the methods below.
Approach 1: Using astype() function
This is the simplest method and property of any pandas Series to convert any dtype using the “astype()” function. Let’s understand by converting the column “Experience” to an integer.
# convert dtype of column to "int" df['Experience'] = df['Experience'].astype(str).astype(int) print(df['Experience'])
Output
Latest Python - Video Tutorial
A 5 B 7 C 2 D 2 E 11 F 4 Name: Experience, dtype: int64
As observed, we have converted the dtype from “object” to “int” for the “Experience” column. We can save the output back in the “Experience” column for further use.
Approach 2: Using convert_dtypes() method
The convert_dtypes()
method automatically understands the data type of any column based on the values stored and converts them to the suitable dtype. Let’s again try to convert the column “Experience” to integer dtype.
# convert dtype of columns df.convert_dtypes().info()
Output
<class 'pandas.core.frame.DataFrame'> Index: 6 entries, A to F Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 6 non-null string 1 Location 6 non-null string 2 Team 6 non-null string 3 Experience 6 non-null Int64 4 RelevantExperience 6 non-null Int64 dtypes: Int64(2), string(3) memory usage: 300.0+ bytes
As observed, we passed the entire DataFrame and it converted all the first three columns as “string” and the “Experience” column as integers based on the type of values stored in each column.
Approach 3: Using pandas.to_numeric() function
Another way is to use pandas.to_numeric function to convert any column into numeric dtype. Let’s experiment with the “Experience” column again.
# convert dtype of column to numeric df['Experience'] = pd.to_numeric(df['Experience'], errors='coerce') print(df['Experience'])
Output
A 5 B 7 C 2 D 2 E 11 F 4 Name: Experience, dtype: int64
We have similar output as the first method. Here, the errors=”coerce” attribute means that in case of any errors (for example – converting “4.0” into int is not possible), it will return NaN instead of throwing an error.
Summary
In this article, we have discussed how to convert dtype ‘object’ to int in Pandas. Thanks.
Latest Video Tutorials