将 Pandas 数据帧与日期范围条件合并
要合并两个 pandas 数据帧,其中一个值介于其他两个值之间,一种常见的方法是合并数据帧无条件地基于标识符,然后根据日期条件进行过滤。但是,此方法可能会导致内存效率低下。
利用 SQL 强大功能的替代解决方案是在 SQL 查询中执行合并和过滤,如下所示:
<code class="python">import pandas as pd import sqlite3 # Create dataframes presidents = pd.DataFrame({"name": ["Bush", "Obama", "Trump"], "president_id":[43, 44, 45]}) terms = pd.DataFrame({'start_date': pd.date_range('2001-01-20', periods=5, freq='48M'), 'end_date': pd.date_range('2005-01-21', periods=5, freq='48M'), 'president_id': [43, 43, 44, 44, 45]}) war_declarations = pd.DataFrame({"date": [datetime(2001, 9, 14), datetime(2003, 3, 3)], "name": ["War in Afghanistan", "Iraq War"]}) # Create an in-memory database conn = sqlite3.connect(':memory:') # Write dataframes to database terms.to_sql('terms', conn, index=False) presidents.to_sql('presidents', conn, index=False) war_declarations.to_sql('wars', conn, index=False) # Execute SQL query to merge and filter dataframes qry = ''' select start_date PresTermStart, end_date PresTermEnd, wars.date WarStart, presidents.name Pres from terms join wars on date between start_date and end_date join presidents on terms.president_id = presidents.president_id ''' df = pd.read_sql_query(qry, conn) print(df)</code>
输出:
PresTermStart PresTermEnd WarStart Pres 0 2001-01-31 00:00:00 2005-01-31 00:00:00 2001-09-14 00:00:00 Bush 1 2001-01-31 00:00:00 2005-01-31 00:00:00 2003-03-03 00:00:00 Bush
通过在 SQL 中执行合并和过滤,我们可以避免创建潜在的大型中间数据帧,从而提高内存效率。
以上是如何有效地将 Pandas DataFrame 与日期范围条件合并?的详细内容。更多信息请关注PHP中文网其他相关文章!