This tutorial will discuss about different ways to select DataFrame rows with non empty strings in a column in Pandas.
Table Of Contents
Preparing DataSet
Let’s create a DataFrame with some dummy data.
import pandas as pd data = {'Col_A': ["11", "12", "13", "14", "15", "16", "17"], 'Col_B': ["21", "22", " ", "24", " ", "26", " "], 'Col_C': ["33", "32", "33", "35", "35", "36", "35"]} index = ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7'] # Create a DataFrame from a dictionary df = pd.DataFrame.from_dict(data) # Set the index list as Index of DataFrame df.set_index(pd.Index(index), inplace=True) print (df)
Output
Col_A Col_B Col_C X1 11 21 33 X2 12 22 32 X3 13 33 X4 14 24 35 X5 15 35 X6 16 26 36 X7 17 35
Now we will operate on this DataFrame.
Select DataFrame Rows with non empty values in a Column
We can use the loc[]
attribute of a DataFrame, to select only those rows from a DataFrame which contains a non empty string in a specified column.
For that first we will select a particular column and then we will strip all string values in that column, i.e. to remove the leading and trailing spaces from each value in that column. Then we will check if any value in that particular column is empty or not. For that, we will apply a condition on that column and it will return as a boolean series.
A True
value in this boolean series represents that the corresponding value in that particular column is not empty. Then we will pass this boolean series into the loc[]
attribute of DataFrame. It will return a DataFrame containing only those rows from the original DataFrame, which contains the not empty value in the specified column. let’s see the complete example, in which we will select roes from DataFramw where column Col_B
has non empty values.
# Select Rows from DataFrame where values in # Column “Col_B" are not empty subDf = df.loc[df['Col_B'].str.strip() != ""] print(subDf)
Output
Col_A Col_B Col_C X1 11 21 33 X2 12 22 32 X4 14 24 35 X6 16 26 36
Summary
We learned about different ways to select DataFrame rows with non empty strings in a speficied Column in Pandas.