This tutorial will discuss about different ways to select DataFrame rows by values in pandas.
Table Of Contents
Preparing DataSet
Let’s create a DataFrame with some dummy data.
import pandas as pd data = {'Col_A': [21, 43, 13, 14, 15, 16, 17], 'Col_B': [37, 22, 43, 24, 37, 43, 27], 'Col_C': [31, 32, 33, 43, 35, 36, 37]} index=["D1", "D2", "D3", "D4", "D5", "D6", "D7"] # Create DataFrame from dictionary df = pd.DataFrame.from_dict(data) # Set list index as Index of DataFrame df .set_index(pd.Index(index), inplace=True) print (df)
Output
Col_A Col_B Col_C D1 21 37 31 D2 43 22 32 D3 13 43 33 D4 14 24 43 D5 15 37 35 D6 16 43 36 D7 17 27 37
Now we will learn a way to select only those rows from a DataFrame which contains a given value.
Select DataFrame Rows which contains the value
Suppose we have a DataFrame and we want to select only those rows from this DataFrame which contains particular value in any column. Basically there is no constraint on the column. If row contains the given value at any position, then we want to select that row.
For example this is a DataFrame
Frequently Asked:
- Pandas: Get sum of column values in a Dataframe
- Pandas: Select Rows where column values ends with a string
- Pandas: Select first column of dataframe in python
- Pandas – Select Column by Name
Col_A Col_B Col_C D1 21 37 31 D2 43 22 32 D3 13 43 33 D4 14 24 43 D5 15 37 35 D6 16 43 36 D7 17 27 37
How we want to select all the rows from this DataFrame which contains the value 43 at any position i.e. in any column. Like this,
Col_A Col_B Col_C D2 43 22 32 D3 13 43 33 D4 14 24 43 D6 16 43 36
We have created a separate function for this.
import functools # Returns the DataFrame rows which contains the given value def GetRows(df, value): # Convert DataFrame to boolean DataFrame, True value indicates that given # value exists at that place in DataFrame boolDf = df.apply( lambda row: row == value, axis=1) # Merge all the columns to single boolean column/ series, # A True value in this series represents that corres ponding row had the given value result = boolDf.apply( lambda row: functools.reduce(lambda a,b : a|b, row), axis=1) return df.loc[result]
It accepts a DataFrame and a value as arguments and returns the DataFrame rows which contains the given value.
Let’s use this function, to check if the DataFrame contains the value 43 or not. For that we will just pass the DataFrame and value 43 as arguments in the GetRows()
function and it will return it data file containing only those rows which contains the value 43.
import functools # Returns the DataFrame rows which contains the given value def GetRows(df, value): # Convert DataFrame to boolean DataFrame, True value indicates that given # value exists at that place in DataFrame boolDf = df.apply( lambda row: row == value, axis=1) # Merge all the columns to single boolean column/ series, # A True value in this series represents that corres ponding row had the given value result = boolDf.apply( lambda row: functools.reduce(lambda a,b : a|b, row), axis=1) return df.loc[result] # Select DataFrame rows which contains the value 43 subDf = GetRows(df, 43) print( subDf )
Output
Col_A Col_B Col_C D2 43 22 32 D3 13 43 33 D4 14 24 43 D6 16 43 36
No inside this GetRows(df, value)
function, we will apply a Lambda function on each row of the DataFrame. Inside the Lambda function, we will check if the row contains the given value or not. It will give us a boolean series. Each True
value in series represents that the given value exists at that particular location in row.
Then we will merge all the values in the single boolean series, to get it single boolean value for a row. A True value means, that row contains the given value. If it is False
then it means that particular row does not contains the given value. Then we will pass the boolean series to loc[]
attribute of DataFrame, to select only those who DataFrame rows which contains the given value.
Summary
We learned how to select only those rows from a DataFrame which contains a given value. Thanks.