Select Rows by value in Pandas

This tutorial will discuss about different ways to select DataFrame rows by values in pandas.

Table Of Contents

Preparing DatSet
Select DataFrame Rows which contains the value
Summary

Preparing DataSet

Let’s create a DataFrame with some dummy data.

import pandas as pd

data = {'Col_A': [21, 43, 13, 14, 15, 16, 17],
        'Col_B': [37, 22, 43, 24, 37, 43, 27],
        'Col_C': [31, 32, 33, 43, 35, 36, 37]}

index=["D1", "D2", "D3", "D4", "D5", "D6", "D7"]

# Create DataFrame from dictionary
df = pd.DataFrame.from_dict(data)

# Set list index as Index of DataFrame
df .set_index(pd.Index(index), inplace=True)

print (df)

Output

    Col_A  Col_B  Col_C
D1     21     37     31
D2     43     22     32
D3     13     43     33
D4     14     24     43
D5     15     37     35
D6     16     43     36
D7     17     27     37

Now we will learn a way to select only those rows from a DataFrame which contains a given value.

Select DataFrame Rows which contains the value

Suppose we have a DataFrame and we want to select only those rows from this DataFrame which contains particular value in any column. Basically there is no constraint on the column. If row contains the given value at any position, then we want to select that row.

For example this is a DataFrame

Frequently Asked:

    Col_A  Col_B  Col_C
D1     21     37     31
D2     43     22     32
D3     13     43     33
D4     14     24     43
D5     15     37     35
D6     16     43     36
D7     17     27     37

How we want to select all the rows from this DataFrame which contains the value 43 at any position i.e. in any column. Like this,

    Col_A  Col_B  Col_C
D2     43     22     32
D3     13     43     33
D4     14     24     43
D6     16     43     36

We have created a separate function for this.

import functools

# Returns the DataFrame rows which contains the given value
def GetRows(df, value):
    # Convert DataFrame to boolean DataFrame, True value indicates that given
    # value exists at that place in DataFrame
    boolDf = df.apply( lambda row: row == value, axis=1)
    # Merge all the columns to single boolean column/ series,
    # A True value in this series represents that corres ponding row had the given value
    result = boolDf.apply( lambda row: functools.reduce(lambda a,b : a|b, row), axis=1)
    return df.loc[result]

It accepts a DataFrame and a value as arguments and returns the DataFrame rows which contains the given value.

Let’s use this function, to check if the DataFrame contains the value 43 or not. For that we will just pass the DataFrame and value 43 as arguments in the GetRows() function and it will return it data file containing only those rows which contains the value 43.

import functools

# Returns the DataFrame rows which contains the given value
def GetRows(df, value):
    # Convert DataFrame to boolean DataFrame, True value indicates that given
    # value exists at that place in DataFrame
    boolDf = df.apply( lambda row: row == value, axis=1)
    # Merge all the columns to single boolean column/ series,
    # A True value in this series represents that corres ponding row had the given value
    result = boolDf.apply( lambda row: functools.reduce(lambda a,b : a|b, row), axis=1)
    return df.loc[result]

# Select DataFrame rows which contains the value 43
subDf = GetRows(df, 43)

print( subDf )

Output

    Col_A  Col_B  Col_C
D2     43     22     32
D3     13     43     33
D4     14     24     43
D6     16     43     36

No inside this GetRows(df, value) function, we will apply a Lambda function on each row of the DataFrame. Inside the Lambda function, we will check if the row contains the given value or not. It will give us a boolean series. Each True value in series represents that the given value exists at that particular location in row.

Then we will merge all the values in the single boolean series, to get it single boolean value for a row. A True value means, that row contains the given value. If it is False then it means that particular row does not contains the given value. Then we will pass the boolean series to loc[] attribute of DataFrame, to select only those who DataFrame rows which contains the given value.

Summary

We learned how to select only those rows from a DataFrame which contains a given value. Thanks.

Preparing DataSet

Select DataFrame Rows which contains the value

Frequently Asked:

Summary

Related posts:

Share your love

Leave a Comment Cancel Reply