In this article, we will look at different ways to convert floats to ints in Pandas. We will look at some exceptional scenarios as well to be taken care of while converting.
Table of Contents
To quickly get started, let’s create a sample dataframe for experimentation. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np np.random.seed(101) # Create a DataFrame object from list of tuples df = pd.DataFrame(np.random.rand(5,5)*10) print(df)
Contents of the created dataframe are,
0 1 2 3 4 0 5.163986 5.706676 0.284742 1.715217 6.852770 1 8.338969 3.069662 8.936131 7.215439 1.899390 2 5.542276 3.521320 1.818924 7.856018 9.654832 3 2.323537 0.835614 6.035484 7.289928 2.762388 4 6.853063 5.178675 0.484845 1.378692 1.869674
Now, let’s look at different ways in which we could convert the floats into ints using pandas.
Convert floats to integers in Pandas using the astype() function
We will use the astype function to convert the entire DataFrame values to int type. Let’s quickly check the existing dtype for the DataFrame.
# checking dtypes print (df.info())
Output
Frequently Asked:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 0 5 non-null float64 1 1 5 non-null float64 2 2 5 non-null float64 3 3 5 non-null float64 4 4 5 non-null float64 dtypes: float64(5) memory usage: 328.0 bytes None
Let’s convert all the columns to int using the astype function as shown below.
# converting all valuee in DataFrame to int df = df.astype(int) print (df) print (df.dtypes)
Output
0 1 2 3 4 0 5 5 0 1 6 1 8 3 8 7 1 2 5 3 1 7 9 3 2 0 6 7 2 4 6 5 0 1 1 0 int64 1 int64 2 int64 3 int64 4 int64 dtype: object
As observed, all the columns are converted from float to int64 dtype. Note that, astype directly converts the number without any rounding off. In case, we want to round off to the nearest integer, we can use it with the round function.
# converting to int post rounding off df = df.round().astype(int) print (df)
Output
0 1 2 3 4 0 10 5 1 4 1 1 0 10 3 5 6 2 2 5 7 5 1 3 6 1 4 7 6 4 3 9 7 2 9
Note that we could use do the same for specific columns by subsetting the columns first and then using the astype() function.
The complete example is as follows,
import pandas as pd import numpy as np np.random.seed(101) # Create a DataFrame object from list of tuples df = pd.DataFrame(np.random.rand(5,5)*10) print(df) # checking dtypes print (df.info()) # converting all valuee in DataFrame to int modDf = df.astype(int) print (modDf) print (modDf.dtypes) # converting to int post rounding off modDf = df.round().astype(int) print (modDf)
Output:
0 1 2 3 4 0 5.163986 5.706676 0.284742 1.715217 6.852770 1 8.338969 3.069662 8.936131 7.215439 1.899390 2 5.542276 3.521320 1.818924 7.856018 9.654832 3 2.323537 0.835614 6.035484 7.289928 2.762388 4 6.853063 5.178675 0.484845 1.378692 1.869674 <class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 0 5 non-null float64 1 1 5 non-null float64 2 2 5 non-null float64 3 3 5 non-null float64 4 4 5 non-null float64 dtypes: float64(5) memory usage: 328.0 bytes None 0 1 2 3 4 0 5 5 0 1 6 1 8 3 8 7 1 2 5 3 1 7 9 3 2 0 6 7 2 4 6 5 0 1 1 0 int64 1 int64 2 int64 3 int64 4 int64 dtype: object 0 1 2 3 4 0 5 6 0 2 7 1 8 3 9 7 2 2 6 4 2 8 10 3 2 1 6 7 3 4 7 5 0 1 2
Convert floats to integers in Pandas using apply() function
Another simple way to convert floats to ints is using the apply method. Let’s quickly implement it on the above raw DataFrame.
# converting to int using apply print (df.apply(np.int64))
Output
0 1 2 3 4 0 9 5 0 4 0 1 0 9 3 5 6 2 2 4 7 4 1 3 6 1 3 7 6 4 3 8 6 1 8
We can use the round function here as well to round off before converting to int.
Converting floats to ints in case of missing values in Pandas
We need to be slightly careful in case of missing values wherever we are playing with the column dtypes. Let’s understand by introducing a few missing values in our current DataFrame.
# changing a few values to np.NaN df.iloc[0,0]=np.NaN df.iloc[0,3]=np.NaN df.iloc[2,3]=np.NaN print (df)
Output
0 1 2 3 4 0 NaN 5.706676 0.284742 NaN 6.852770 1 8.338969 3.069662 8.936131 7.215439 1.899390 2 5.542276 3.521320 1.818924 NaN 9.654832 3 2.323537 0.835614 6.035484 7.289928 2.762388 4 6.853063 5.178675 0.484845 1.378692 1.869674
Let’s try to use the astype function directly on this DataFrame as below.
print (df.astype(int))
Output
ValueError: Cannot convert non-finite values (NA or inf) to integer
It directly results in a ValueError that it can’t convert non-finite values into integers. Let’s also try the apply method that we discussed above.
print (df.apply(np.int64))
Output
0 1 2 3 4 0 -9223372036854775808 5 0 -9223372036854775808 6 1 8 3 8 7 1 2 5 3 1 -9223372036854775808 9 3 2 0 6 7 2 4 6 5 0 1 1
The apply() method as converted this np.NaN values into some constant integer “-9223372036854775808” which is even more dangerous because, in the case of a large DataFrame, this might can go unnoticed. Therefore, it is recommended to check for missing values first, fill them with suitable values, and then convert them to ints.
print (df.fillna(0).astype(int))
Output
0 1 2 3 4 0 0 5 0 0 6 1 8 3 8 7 1 2 5 3 1 0 9 3 2 0 6 7 2 4 6 5 0 1 1
As observed, the missing values are filled with 0 values and then converted to int.
The complete example is as follows,
import pandas as pd import numpy as np np.random.seed(101) # Create a DataFrame object from list of tuples df = pd.DataFrame(np.random.rand(5,5)*10) print(df) # changing a few values to np.NaN df.iloc[0,0]=np.NaN df.iloc[0,3]=np.NaN df.iloc[2,3]=np.NaN print (df) print (df.fillna(0).astype(int))
Output:
0 1 2 3 4 0 5.163986 5.706676 0.284742 1.715217 6.852770 1 8.338969 3.069662 8.936131 7.215439 1.899390 2 5.542276 3.521320 1.818924 7.856018 9.654832 3 2.323537 0.835614 6.035484 7.289928 2.762388 4 6.853063 5.178675 0.484845 1.378692 1.869674 0 1 2 3 4 0 NaN 5.706676 0.284742 NaN 6.852770 1 8.338969 3.069662 8.936131 7.215439 1.899390 2 5.542276 3.521320 1.818924 NaN 9.654832 3 2.323537 0.835614 6.035484 7.289928 2.762388 4 6.853063 5.178675 0.484845 1.378692 1.869674 0 1 2 3 4 0 0 5 0 0 6 1 8 3 8 7 1 2 5 3 1 0 9 3 2 0 6 7 2 4 6 5 0 1 1
Summary
Great, you made it! In this article, we have discussed multiple ways to convert the floats to ints in pandas.