This tutorial will discuss about different ways to select rows by column value in pandas.
Table Of Contents
Preparing DataSet
Let’s create a DataFrame with some hadcoded values.
import pandas as pd data = {'A': [11, 12, 13, 14, 15, 16, 17], 'B': [24, 22, 23, 24, 25, 26, 27], 'C': [31, 32, 33, 34, 35, 33, 33], 'D': ['Test1', 'Rest1', 'Nest', 'Crest', 'Temp1', 'Temp2', 'Temp3']} index=['D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7'] # Create DataFrame from dictionary df = pd.DataFrame.from_dict(data) # Set list index as Index of Dataframe df.set_index(pd.Index(index), inplace=True) print (df)
Output
A B C D D1 11 24 31 Test1 D2 12 22 32 Rest1 D3 13 23 33 Nest D4 14 24 34 Crest D5 15 25 35 Temp1 D6 16 26 33 Temp2 D7 17 27 33 Temp3
Now we will select rows from this DataFrame by column values.
Select DataFrame rows where column value is equal to the given value
Suppose we have a DataFrame and we want to select only those rows from this DataFrame, in which a specific column contains a given value.
For that we are going to use the loc[]
attribute of the DataFrame and we will pass a boolean series in this loc[]
attribute. A True
value in these boolean series represents that it need to select that particular row from the DataFrame.
Frequently Asked:
To get this boolean series, we will select the specified column and apply a condition on it. It will return as a boolean series, where each True
value represent the fact, that the corresponding value in that column satisfies the condition. Then we will pass that boolean series to the loc[]
attribute and it will return a DataFrame containing those rows where specified column has the given value.
Like in the below example, we are going to select only those rows from the DataFrame where column C
has value 33.
# Select rows where column 'C' has value 33 subDf = df.loc[df['C'] == 33] print (subDf)
Output
A B C D D3 13 23 33 Nest D6 16 26 33 Temp2 D7 17 27 33 Temp3
Select DataFrame rows where column value is in a given list
We can use the isin() method of pandas Series to select only those DataFrame rows, where a specified column contains any value from a given list.
In the below example, we will select only those rows from the DataFrame where column a contains any value from the given list. For that, we will select the specified column from the DataFrame, and call the isin() function on that column, and pass the given list as argument in it. It will return as a boolean array, where each True
value represents that the corresponding column value contains ant value from the given list.
Then we will pass this boolean array to the loc[]
attribute and it will return us the rows where column B
Contains a value from the given list.
# Select rows where column 'A' has any value from the given list subDf = df.loc[df['A'].isin([12, 17, 15])] print (subDf)
Output
A B C D D2 12 22 32 Rest1 D5 15 25 35 Temp1 D7 17 27 33 Temp3
Select rows where column values are in a given range
We can also select specific rows from DataFrame where a column contains a value in a given range. For that we need to apply multiple conditions on the column and get a boolean array and then we need to pass this boolean array to the loc[]
attribute of DataFrame.
Like in the below example, we are going to select only those rows from the data from where column C has a value greater than 33 but less than 35.
# Select rows where column 'C' has value # greater than 33 and less than 35 subDf = df.loc[(df['C'] > 32) & (df['C'] < 35)] print(subDf)
Output
A B C D D3 13 23 33 Nest D4 14 24 34 Crest D6 16 26 33 Temp2 D7 17 27 33 Temp3
Select rows where column values satisfies a condition
We can also use the loc[]
attribute to select only those rows from the DataFrame where values in a column satisfies a given condition.
For that we need to select that particular column and apply the given condition on that column. It will return as a boolean array, that we will pass in the loc[]
attribute and it will return us only those rows for which the values in column satisfies the condition.
In the below example, we are going to select only those rows from the DataFrame where column B
contains the even values.
# Select rows where values in column 'B' are even subDf = df.loc[df['B'] % 2 == 0] print (subDf)
Output
A B C D D1 11 24 31 Test1 D2 12 22 32 Rest1 D4 14 24 34 Crest D6 16 26 33 Temp2
Summary
We learned about different ways to select DataFrame rows by column value in Pandas. Thanks.