Home >Backend Development >Python Tutorial >How to Efficiently Filter DataFrame Rows by Date Range?

How to Efficiently Filter DataFrame Rows by Date Range?

Barbara Streisand
Barbara StreisandOriginal
2024-12-12 16:30:111020browse

How to Efficiently Filter DataFrame Rows by Date Range?

Query DataFrame Rows Within a Specified Date Range

This question addresses the challenge of extracting rows within a particular date range from a DataFrame containing a date column. The provided solution offers two approaches for achieving this.

Method 1: Utilizing a Boolean Mask

To adopt this method, ensure that 'date' in your DataFrame represents a Series with dtype datetime64[ns]. Employ the following steps:

  1. Create a Boolean Mask: Specify start_date and end_date parameters that can be datetime.datetimes, np.datetime64s, pd.Timestamps, or datetime strings. Construct a boolean mask that evaluates as True for rows that meet the date range criteria.
  2. Select Sub-DataFrame: Use df.loc[mask] to extract the rows that pass the mask condition. Alternatively, to overwrite the existing DataFrame, apply the mask as df = df.loc[mask].

Method 2: Assigning a DatetimeIndex

Optimal for scenarios involving frequent date selections, this approach involves setting the date column as the index:

  1. Set DatetimeIndex: Convert the date column to a DatetimeIndex using df.set_index(['date']).
  2. Select Rows by Date: Leverage df.loc[start_date:end_date] to filter rows based on the date range. Note that both start_date and end_date are inclusive in this selection.

Example:

Utilizing the code provided in the response, consider the following illustration:

import pandas as pd

df = pd.DataFrame({'date': pd.date_range('2023-03-01', periods=10)})
df['value'] = np.random.randn(10)

# Boolean Mask Approach
start_date = '2023-03-03'
end_date = '2023-03-08'
mask = (df['date'] > start_date) & (df['date'] <= end_date)
df_subset = df.loc[mask]

# DatetimeIndex Approach
df = df.set_index('date')
df_subset = df.loc[start_date:end_date]

This would yield two DataFrames that contain rows corresponding to the specified date range.

The above is the detailed content of How to Efficiently Filter DataFrame Rows by Date Range?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn