In this article we will discuss different ways to replace NaN Values with empty strings in a specific column of Dataframe or in complete DataFrame in Python.
Table Of Contents
- Replace NaN values with empty string using fillna()
- Replace NaN values with empty string using replace()
A DataFrame is a data structure that stores the data the in tabular format i.e. in the format of rows and columns. We can create a DataFrame using pandas.DataFrame() method. In Python , we can create NaN values using the numpy module.. Let’s use this to create a dataframe of four rows and five columns with few NaN values.
import pandas as pd import numpy as np # Create dataframe with 4 rows and 5 columns df= pd.DataFrame({'First' :[0, 0, 0, 0], 'Second' :[np.nan, np.nan,1 ,1], 'Third' :[0, 0, 0, 0], 'Fourth' :[0, 1, 89, np.nan], 'Fifth' :[34, np.nan,45,34]}) # Display the Dataframe print(df)
Output:
First Second Third Fourth Fifth 0 0 NaN 0 0.0 34.0 1 0 NaN 0 1.0 NaN 2 0 1.0 0 89.0 45.0 3 0 1.0 0 NaN 34.0
Replace NaN values with empty string using fillna()
In Pandas, both DataFrame and Series provides a member function fillna() to fill/replace NaN values with a specified value. Their Syntax are as follows,
Series.fillna(value)
It replaces all the NaN values in the calling Series object with the specified value
DataFrame.fillna(value)
It replaces all the NaN values in the calling DataFrame object with the specified value
Replace NaN values with empty string in a column using fillna()
We can select a single column of Dataframe as a Series object and then call the fillna(”) on that column to replace all NaN values with empty strings in that column. For example,
import pandas as pd import numpy as np # Create dataframe with 4 rows and 5 columns df= pd.DataFrame({'First' :[0, 0, 0, 0], 'Second' :[np.nan, np.nan,1 ,1], 'Third' :[0, 0, 0, 0], 'Fourth' :[0, 1, 89, np.nan], 'Fifth' :[34, np.nan,45,34]}) # Display the Dataframe print(df) # Replace NaN with empty strings in column 'Second' df['Second'] = df['Second'].fillna('') # Display the Dataframe print(df)
Output:
First Second Third Fourth Fifth 0 0 NaN 0 0.0 34.0 1 0 NaN 0 1.0 NaN 2 0 1.0 0 89.0 45.0 3 0 1.0 0 NaN 34.0 First Second Third Fourth Fifth 0 0 0 0.0 34.0 1 0 0 1.0 NaN 2 0 1 0 89.0 45.0 3 0 1 0 NaN 34.0
Here, we selected the column ‘Second’ as a Series object and then called the fillna() function on that with an empty string as an argument. Therefore, it replaced all the NaN values in column ‘Second’ with the empty strings.
Replace NaN Values with empty strings entire dataframe using fillna()
Call the fillna() function of the DataFrame object with an empty string as argument. It will replace NaN values in the entire DataFrame with empty strings. For example,
import pandas as pd import numpy as np # Create dataframe with 4 rows and 5 columns df= pd.DataFrame({'First' :[0, 0, 0, 0], 'Second' :[np.nan, np.nan,1 ,1], 'Third' :[0, 0, 0, 0], 'Fourth' :[0, 1, 89, np.nan], 'Fifth' :[34, np.nan,45,34]}) # Display the Dataframe print(df) # Replace NaN with empty strings in entire DataFrame df = df.fillna('') # Display the Dataframe print(df)
Output:
First Second Third Fourth Fifth 0 0 NaN 0 0.0 34.0 1 0 NaN 0 1.0 NaN 2 0 1.0 0 89.0 45.0 3 0 1.0 0 NaN 34.0 First Second Third Fourth Fifth 0 0 0 0 34 1 0 0 1 2 0 1 0 89 45 3 0 1 0 34
Replace NaN values with empty string using replace()
In Pandas, both the Dataframe and series class provides a function replace() to change the contents. We are going to use these functions,
DataFrame.replace()
To replace all the occurrences of a value in the entire Dataframe, pass the item to be replaced and replacement value as arguments to the replace() function.
DataFrame.replace(to_replace, value)
Series.replace()
Series.replace(to_replace, value)
To replace the value to be changed with the given value.
Let’s use this to replace NaN values with empty strings.
Replace NaN Values with empty strings in a column using replace()
Select the column ‘Second’ as a Series object from the Dataframe and the call the replace() function to replace all NaN values in that column with empty strings. For example,
import pandas as pd import numpy as np # Create dataframe with 4 rows and 5 columns df= pd.DataFrame({'First' :[0, 0, 0, 0], 'Second' :[np.nan, np.nan,1 ,1], 'Third' :[0, 0, 0, 0], 'Fourth' :[0, 1, 89, np.nan], 'Fifth' :[34, np.nan,45,34]}) # Display the Dataframe print(df) # Replace NaN with empty string in column 'Second' df['Second'] = df['Second'].replace(np.NaN, '') # Display the Dataframe print(df)
Output:
First Second Third Fourth Fifth 0 0 NaN 0 0.0 34.0 1 0 NaN 0 1.0 NaN 2 0 1.0 0 89.0 45.0 3 0 1.0 0 NaN 34.0 First Second Third Fourth Fifth 0 0 0 0.0 34.0 1 0 0 1.0 NaN 2 0 1 0 89.0 45.0 3 0 1 0 NaN 34.0
Replace NaN Values with empty strings in entire dataframe using replace()
Call the replace() function on DataFrame object with arguments NaN and ”. It will replace all occurrences of NaNs with empty strings in the entire DataFrame. For example,
import pandas as pd import numpy as np # Create dataframe with 4 rows and 5 columns df= pd.DataFrame({'First' :[0, 0, 0, 0], 'Second' :[np.nan, np.nan,1 ,1], 'Third' :[0, 0, 0, 0], 'Fourth' :[0, 1, 89, np.nan], 'Fifth' :[34, np.nan,45,34]}) # Display the Dataframe print(df) # Replace NaN with empty strings in entore DataFrame df = df.replace(np.NaN, '') # Display the Dataframe print(df)
Output:
First Second Third Fourth Fifth 0 0 NaN 0 0.0 34.0 1 0 NaN 0 1.0 NaN 2 0 1.0 0 89.0 45.0 3 0 1.0 0 NaN 34.0 First Second Third Fourth Fifth 0 0 0 0 34 1 0 0 1 2 0 1 0 89 45 3 0 1 0 34
Summary
In this article we learned about two different ways to replace NaN values with empty strings, either in a column or in entire dataframe.
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.