In this article, we will discuss different ways to replace blank values / whitespaces with NaN values in a Pandas DataFrame.
A DataFrame is a Python Data Structure that stores the data in a tabular format i.e. in rows and columns. We can create a DataFrame using pandas.DataFrame() method. Let’s create a dataframe with four rows and two columns and with some empty string values,
import pandas as pd # Create dataframe with two columns and four rows df = pd.DataFrame({ "Name" : [" ", "sravan", "ramya", " "], "Subjects" : [" ", "python", " ", " "]}) # Display the Dataframe print(df)
Output:
Name Subjects 0 1 sravan python 2 ramya 3
In the above DataFrame, there are different types of empty strings i.e. with single whitespace and more than single space empty strings. Let’s see how to replace all the empty strings with with NaN.
Replace empty strings in Dataframe using replace() and regex
In Pandas, both the Dataframe and Series class provides a function replace() to change the contents. Let’s look into their syntax,
DataFrame.replace()
DataFrame.replace(to_replace, replacement, regex=True)
It accepts three values as arguments i.e
- to_replace: Direct value or a regex pattern. If regex pattern, then based on this, it will decide which values needs to be replaced.
- replacement: The replacement value
- regex: If True, then first parameter “to_replace” is used as regex pattern.
In entire DataFrame, it will look for values that matches the regex pattern and replace those values with the given replacement string.
Series.replace()
Series.replace(to_replace, replacement, regex=True)
It accepts three values as arguments i.e
- to_replace: Direct value or a regex pattern. If regex pattern, then based on this, it will decide which values needs to be replaced.
- replacement: The replacement value
- regex: If True, then first parameter “to_replace” is used as regex pattern.
In entire Series, it will look for values that matches the regex pattern and replace those values with the given replacement string.
Let’s use these functions to replace empty strings with NaN, either in entire Dataframe or in a column only.
Replace empty strings with NaN in a DataFrame Column
Select a DataFrame column as a Series object and call the replace() function on it with following parameters,
- As a first parameter pass a regex pattern that will match one or more whitespaces i.e. “^\s*$” .
- As second parameter pass a replacement value i.e. np.NaN
- As third parameter pass regex=True
It will replace all the empty strings with NaN values in the column. For example,
import pandas as pd import numpy as np # Create dataframe with two columns and four rows df = pd.DataFrame({ "Name" : [" ", "sravan", "ramya", " "], "Subjects" : [" ", "python", " ", " "]}) # Display the Dataframe print(df) # Replace empty strings with NaN in column 'Name' df['Name'] = df['Name'].replace(["^\s*$"], np.NaN, regex=True) # Display the Dataframe print(df)
Output:
Name Subjects 0 1 sravan python 2 ramya 3 Name Subjects 0 NaN 1 sravan python 2 ramya 3 NaN
It replaced all the empty strings in column ‘Name’ with NaN values.
Replace empty strings with NaN values in entire dataframe
Call the replace() function on the DataFrame object with following parameters,
- As a first parameter pass a regex pattern that will match one or more whitespaces i.e. “^\s*$” .
- As second parameter pass a replacement value i.e. np.NaN
- As third parameter pass regex=True
It will replace all the empty strings with NaN values in the entire. For example,
import pandas as pd import numpy as np # Create dataframe with two columns and four rows df = pd.DataFrame({ "Name" : [" ", "sravan", "ramya", " "], "Subjects" : [" ", "python", " ", " "]}) # Display the Dataframe print(df) # Replace empty strings with NaN in entire DataFrame df = df.replace(["^\s*$"], np.NaN, regex=True) # Display the Dataframe print(df)
Output:
Name Subjects 0 1 sravan python 2 ramya 3 Name Subjects 0 NaN NaN 1 sravan python 2 ramya NaN 3 NaN NaN
It replaced all the empty strings with NaN values in the entire DataFrame.
Summary
In this article we learned how to replace empty strings with NaN values in a Dataframe using regex and replace() function.
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.