Home  >  Article  >  Backend Development  >  How to Efficiently Merge Pandas DataFrames with Date Range Conditions?

How to Efficiently Merge Pandas DataFrames with Date Range Conditions?

Barbara Streisand
Barbara StreisandOriginal
2024-11-02 13:49:02944browse

How to Efficiently Merge Pandas DataFrames with Date Range Conditions?

Merging Pandas Dataframes with Date Range Conditions

To merge two pandas dataframes where one value falls between two others, a common approach involves merging the dataframes unconditionally on an identifier and subsequently filtering based on the date condition. However, this method can lead to memory inefficiencies.

An alternative solution that leverages the power of SQL is to perform the merge and filtering within an SQL query, as demonstrated below:

<code class="python">import pandas as pd
import sqlite3

# Create dataframes
presidents = pd.DataFrame({"name": ["Bush", "Obama", "Trump"],
                           "president_id":[43, 44, 45]})
terms = pd.DataFrame({'start_date': pd.date_range('2001-01-20', periods=5, freq='48M'),
                      'end_date': pd.date_range('2005-01-21', periods=5, freq='48M'),
                      'president_id': [43, 43, 44, 44, 45]})
war_declarations = pd.DataFrame({"date": [datetime(2001, 9, 14), datetime(2003, 3, 3)],
                                 "name": ["War in Afghanistan", "Iraq War"]})

# Create an in-memory database
conn = sqlite3.connect(':memory:')

# Write dataframes to database
terms.to_sql('terms', conn, index=False)
presidents.to_sql('presidents', conn, index=False)
war_declarations.to_sql('wars', conn, index=False)

# Execute SQL query to merge and filter dataframes
qry = '''
    select  
        start_date PresTermStart,
        end_date PresTermEnd,
        wars.date WarStart,
        presidents.name Pres
    from
        terms join wars on
        date between start_date and end_date join presidents on
        terms.president_id = presidents.president_id
    '''
df = pd.read_sql_query(qry, conn)

print(df)</code>

Output:

         PresTermStart          PresTermEnd             WarStart  Pres
0  2001-01-31 00:00:00  2005-01-31 00:00:00  2001-09-14 00:00:00  Bush
1  2001-01-31 00:00:00  2005-01-31 00:00:00  2003-03-03 00:00:00  Bush

By performing the merge and filtering within SQL, we can avoid creating a potentially large intermediate dataframe, resulting in improved memory efficiency.

The above is the detailed content of How to Efficiently Merge Pandas DataFrames with Date Range Conditions?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn