This tutorial will discuss about different ways to select DataFrame rows by timestamp in Pandas.
Table Of Contents
Preparing DataSet
First we will create a DataFrame with some hard coded values.
import pandas as pd data = {'A': [11, 12, 13, 14, 15, 16, 17], 'B': [21, 22, 23, 24, 25, 26, 27], 'Last_Visited_Timestamp': ['2023-01-14 02:00:00', '2023-01-14 03:00:00', '2023-01-14 03:30:00', '2023-01-14 04:15:00', '2023-01-14 06:30:11', '2023-01-14 09:11:21', '2023-01-14 11:35:31']} index=['D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7'] # Create DataFrame from dictionary df = pd.DataFrame.from_dict(data) # Set list index as Index of Dataframe df.set_index(pd.Index(index), inplace=True) print (df)
Output
A B Last_Visited_Timestamp D1 11 21 2023-01-14 02:00:00 D2 12 22 2023-01-14 03:00:00 D3 13 23 2023-01-14 03:30:00 D4 14 24 2023-01-14 04:15:00 D5 15 25 2023-01-14 06:30:11 D6 16 26 2023-01-14 09:11:21 D7 17 27 2023-01-14 11:35:31
Suppose we have a DataFrame where one column contains some Timestamp values. We want to select some specific rows from this DataFrame based on the Timestamps in any given range.
For example or sample DataFrame is like this,
A B Last_Visited_Timestamp D1 11 21 2023-01-14 02:00:00 D2 12 22 2023-01-14 03:00:00 D3 13 23 2023-01-14 03:30:00 D4 14 24 2023-01-14 04:15:00 D5 15 25 2023-01-14 06:30:11 D6 16 26 2023-01-14 09:11:21 D7 17 27 2023-01-14 11:35:31
No we want to select only those rows from this DataFrame, where the values in column Last_Visited_Timestamp
is between a given start and end timestamps i.e. between ‘2023-01-14 03:15:00’ and ‘2023-01-14 06:43:00’, like this,
Frequently Asked:
D3 13 23 2023-01-14 03:30:00 D4 14 24 2023-01-14 04:15:00 D5 15 25 2023-01-14 06:30:11
We can select the timestamp column and then we will apply multiple conditions on it,
- First condition is that the timestamp value should be greater than the start timestamp
- Second condition is that the timestamp value should be less than the end timestamp.
It will give us a boolean series where each true value represent that the particular value in column satisfies the given condition. Then we will pass this boolean series into the loc[]
attribute and it will give us the rows for which the given column has values in the given timestamp range.
In the below example, we are going to select only those rows from the DataFrame where the Last_Visited_Timestamp
column contains the values in the timestamp range i.e. from ‘2023-01-14 03:15:00’ to ‘2023-01-14 06:43:00’.
# Starting Timestamp startingTimestamp = '2023-01-14 03:15:00' # Ending Timestamp endingTimestamp = '2023-01-14 06:43:00' # Select rows where column 'Last_Visited_Timestamp’ has values in the i.e. # from '2023-01-14 3:15:00' till '2023-01-14 06:43:00' subDf = df.loc[ (df[ 'Last_Visited_Timestamp'] > startingTimestamp) & (df['Last_Visited_Timestamp'] <= endingTimestamp) ] print(subDf)
Output
A B Last_Visited_Timestamp D3 13 23 2023-01-14 03:30:00 D4 14 24 2023-01-14 04:15:00 D5 15 25 2023-01-14 06:30:11
Select Rows by Timestamp Range using query() method
In the previous example we used the loc[]
attribute to select the rows by timestamp range. We can do that same thing using the query()
method of DataFrame. We need to pass a query inside this query()
method of the DataFrame.
Like in the below example, we are going to select rows where column Last_Visited_Timestamp
has values in the given timestamp range that is from ‘2023-01-14 03:15:00’ to ‘2023-01-14 06:43:00’.
# Starting Timestamp startingTimestamp = '2023-01-14 03:15:00' # Ending Timestamp endingTimestamp = '2023-01-14 06:43:00' # Select rows where column 'Last_Visited_Timestamp’ has values in the i.e. # from '2023-01-14 03:15:00' till '2023-01-14 06:43:00' subDf = df.query('Last_Visited_Timestamp >= @startingTimestamp and Last_Visited_Timestamp <= @endingTimestamp' ) print (subDf)
Output
A B Last_Visited_Timestamp D3 13 23 2023-01-14 03:30:00 D4 14 24 2023-01-14 04:15:00 D5 15 25 2023-01-14 06:30:11
Select Rows by Timestamp Range using Series.between() method
We can use the Series.between() method to select rows in a DataFrame that contains the values in a column in a timestamp Range.
We will pass the start timestamp and end timestamp as arguments in between() method and it will return a boolean array where true represents that that particular value in the selected column is in the given timestamp range.
Then we will pass that boolean series to the loc[]
attribute and it will select only those rows from the DataFrame for which the given column has a value in the given timestamp range.
Like in the below example, we are going to select rows where column Last_Visited_Timestamp
has values in the given timestamp range that is from ‘2023-01-14 03:15:00’ to ‘2023-01-14 06:43:00’.
# Starting Timestamp startingTimestamp = '2023-01-14 03:15:00' # Ending Timestamp endingTimestamp = '2023-01-14 06:43:00' # Select rows where column 'Last_Visited_Timestamp’ has values in the given Timestamp Range # from '2023-01-14 03:15:00' till '2023-01-14 06:43:00' subDf = df.loc[df['Last_Visited_Timestamp'].between(startingTimestamp, endingTimestamp) ] print(subDf)
Output
A B Last_Visited_Timestamp D3 13 23 2023-01-14 03:30:00 D4 14 24 2023-01-14 04:15:00 D5 15 25 2023-01-14 06:30:11
Summary
We learned how to select DataFrame Rows within a Timestamp Range in Pandas. Thanks.