This tutorial will discuss about different ways to select DataFrame rows within a date range in Pandas.
Table Of Contents
Preparing DataSet
First we will create a DataFrame with some hard coded values.
import pandas as pd data = {'A': [11, 12, 13, 14, 15, 16, 17], 'B': [21, 22, 23, 24, 25, 26, 27], 'Last_Visited_Date': [ '2023-01-14', '2023-01-15', '2023-01-17', '2023-01-21', '2023-01-25', '2023-01-31', '2023-02-03']} index=['D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7'] # Create DataFrame from dictionary df = pd.DataFrame.from_dict(data) # Set list index as Index of Dataframe df.set_index(pd.Index(index), inplace=True) print (df)
Output
A B Last_Visited_Date D1 11 21 2023-01-14 D2 12 22 2023-01-15 D3 13 23 2023-01-17 D4 14 24 2023-01-21 D5 15 25 2023-01-25 D6 16 26 2023-01-31 D7 17 27 2023-02-03
Now we will learn how to select DataFrame rows where values in a column is in a given Date Range
Select Rows within a Date Range using loc[] Attribute
Suppose we have a DataFrame where one column contains some Date values. We want to select some specific rows from this DataFrame based on the dates in any given range.
For example or sample DataFrame is like this,
Frequently Asked:
- Replace NaN with an empty string in DataFrame in Pandas
- Pandas Dataframe.sum() method – Tutorial & Examples
- Drop List of Rows from a Pandas DataFrame
- Drop last N rows of pandas dataframe
A B Last_Visited_Date D1 11 21 2023-01-14 D2 12 22 2023-01-15 D3 13 23 2023-01-17 D4 14 24 2023-01-21 D5 15 25 2023-01-25 D6 16 26 2023-01-31 D7 17 27 2023-02-03
No we want to select only those rows from this DataFrame, where the values in column Last_Visited_Date
is between a given start and end date i.e. between ‘2023-01-14’ and ‘2023-01-28’, like this,
D2 12 22 2023-01-15 D3 13 23 2023-01-17 D4 14 24 2023-01-21 D5 15 25 2023-01-25
We can select the date column and then we will apply multiple conditions on it,
- First condition is that the date value should be greater than the start date
- Second condition is that the date value should be less than the end date.
It will give us a boolean series where each true value represent that the particular value in column satisfies the given condition. Then we will pass this boolean series into the loc[]
attribute and it will give us the rows for which the given column has values in the given date range.
In the below example, we are going to select only those rows from the DataFrame where the Last_Visited_Date
column contains the values in the date range i.e. from ‘2023-01-14’ to ‘2023-01-28’.
# Start of Data Range startDate = '2023-01-14' # End of Data Range endDate = '2023-01-28' # Select rows where column 'Last_Visited_Date' has values in the given date range i.e. # from '2023-01-14' till '2023-01-28' subDf = df.loc[(df['Last_Visited_Date'] > startDate) & (df['Last_Visited_Date'] <= endDate)] print (subDf)
Output
A B Last_Visited_Date D2 12 22 2023-01-15 D3 13 23 2023-01-17 D4 14 24 2023-01-21 D5 15 25 2023-01-25
Select Rows by Date Range using query() method
In the previous example we used the loc[]
attribute to select the rows by date range. We can do that same thing using the query()
method of DataFrame. We need to pass a query inside this query()
method of the DataFrame.
Like in the below example, we are going to select rows where column Last_Visited_Date
has values in the given date range that is from ‘2023-01-14’ till ‘2023-01-28’.
# Start of Data Range startDate = '2023-01-14' # End of Data Range endDate = '2023-01-28' # Select rows where column 'Last_Visited_Date' has values in the given date range i.e. # from '2023-01-14' till '2023-01-28' subDf = df.query('Last_Visited_Date >= @startDate and Last_Visited_Date <= @endDate') print (subDf)
Output
A B Last_Visited_Date D1 11 21 2023-01-14 D2 12 22 2023-01-15 D3 13 23 2023-01-17 D4 14 24 2023-01-21 D5 15 25 2023-01-25
Select DataFrame Rows by Date Range using Series.between() method
We can use the Series.between() method to select rows in a DataFrame that contains the values in a column in a Date Range.
We will pass the start date and end date as arguments in between() method and it will return a boolean array where true represents that that particular value in the selected column is in the given date range.
Then we will pass that boolean series to the loc[]
attribute and it will select only those rows from the DataFrame for which the given column has a value in the given date range.
Like in the below example, we are going to select rows where column Last_Visited_Date
has values in the given date range that is from ‘2023-01-14’ till ‘2023-01-28’.
# Start of Data Range startDate = '2023-01-14' # End of Data Range endDate = '2023-01-28' # Select DataFrame Rows where column 'Last Visited Date' has values in the given date range i.e. # from '2023-01-14' till '2023-01-28' subDf = df.loc[df['Last_Visited_Date'].between(startDate, endDate)] print (subDf)
Output
A B Last_Visited_Date D1 11 21 2023-01-14 D2 12 22 2023-01-15 D3 13 23 2023-01-17 D4 14 24 2023-01-21 D5 15 25 2023-01-25
Summary
We learned how to select DataFrame Rows within a Date Range in Pandas. Thanks.