Home >Backend Development >Python Tutorial >How to Efficiently Merge Pandas DataFrames with Date Range Conditions?
Merging Pandas Dataframes with Date Range Conditions
To merge two pandas dataframes where one value falls between two others, a common approach involves merging the dataframes unconditionally on an identifier and subsequently filtering based on the date condition. However, this method can lead to memory inefficiencies.
An alternative solution that leverages the power of SQL is to perform the merge and filtering within an SQL query, as demonstrated below:
<code class="python">import pandas as pd import sqlite3 # Create dataframes presidents = pd.DataFrame({"name": ["Bush", "Obama", "Trump"], "president_id":[43, 44, 45]}) terms = pd.DataFrame({'start_date': pd.date_range('2001-01-20', periods=5, freq='48M'), 'end_date': pd.date_range('2005-01-21', periods=5, freq='48M'), 'president_id': [43, 43, 44, 44, 45]}) war_declarations = pd.DataFrame({"date": [datetime(2001, 9, 14), datetime(2003, 3, 3)], "name": ["War in Afghanistan", "Iraq War"]}) # Create an in-memory database conn = sqlite3.connect(':memory:') # Write dataframes to database terms.to_sql('terms', conn, index=False) presidents.to_sql('presidents', conn, index=False) war_declarations.to_sql('wars', conn, index=False) # Execute SQL query to merge and filter dataframes qry = ''' select start_date PresTermStart, end_date PresTermEnd, wars.date WarStart, presidents.name Pres from terms join wars on date between start_date and end_date join presidents on terms.president_id = presidents.president_id ''' df = pd.read_sql_query(qry, conn) print(df)</code>
Output:
PresTermStart PresTermEnd WarStart Pres 0 2001-01-31 00:00:00 2005-01-31 00:00:00 2001-09-14 00:00:00 Bush 1 2001-01-31 00:00:00 2005-01-31 00:00:00 2003-03-03 00:00:00 Bush
By performing the merge and filtering within SQL, we can avoid creating a potentially large intermediate dataframe, resulting in improved memory efficiency.
The above is the detailed content of How to Efficiently Merge Pandas DataFrames with Date Range Conditions?. For more information, please follow other related articles on the PHP Chinese website!