搜索

首页  >  问答  >  正文

基于按日期排序的联合CTE,返回前一行的值运行缓慢。

<p>从以下的体育比赛表格中:</p> <table class="s-table"> <thead> <tr> <th>id_</th> <th>p1_id</th> <th>p2_id</th> <th>match_date</th> <th>p1_stat</th> <th>p2_stat</th> </tr> </thead> <tbody> <tr> <td>852666</td> <td>1</td> <td>2</td> <td>01/01/1997</td> <td>1301</td> <td>249</td> </tr> <tr> <td>852842</td> <td>1</td> <td>2</td> <td>13/01/1997</td> <td>2837</td> <td>2441</td> </tr> <tr> <td>853471</td> <td>2</td> <td>1</td> <td>05/05/1997</td> <td>1474</td> <td>952</td> </tr> <tr> <td>4760</td> <td>2</td> <td>1</td> <td>25/05/1998</td> <td>1190</td> <td>1486</td> </tr> <tr> <td>6713</td> <td>2</td> <td>1</td> <td>18/01/1999</td> <td>2084</td> <td>885</td> </tr> <tr> <td>9365</td> <td>2</td> <td>1</td> <td>01/11/1999</td> <td>2894</td> <td>2040</td> </tr> <tr> <td>11456</td> <td>1</td> <td>2</td> <td>15/05/2000</td> <td>2358</td> <td>1491</td> </tr> <tr> <td>13022</td> <td>1</td> <td>2</td> <td>14/08/2000</td> <td>2722</td> <td>2401</td> </tr> <tr> <td>29159</td> <td>1</td> <td>2</td> <td>26/08/2002</td> <td>431</td> <td>2769</td> </tr> <tr> <td>44915</td> <td>1</td> <td>2</td> <td>07/10/2002</td> <td>1904</td> <td>482</td> </tr> </tbody> </table> <p>对于选定的比赛id_,我希望返回两位选手在他们各自上一场比赛中的统计数据,无论该选手在上一场比赛中是p1还是p2。请参考下面的预期输出,其中id_ = 11456:</p> <table class="s-table"> <thead> <tr> <th>id_</th> <th>p1_id</th> <th>p2_id</th> <th>match_date</th> <th>p1_stat</th> <th>p2_stat</th> <th>p1_prev_stat</th> <th>p2_prev_stat</th> </tr> </thead> <tbody> <tr> <td>11456</td> <td>1</td> <td>2</td> <td>15/05/2000</td> <td>2358</td> <td>1491</td> <td>2040</td> <td>2894</td> </tr> </tbody> </table> <p>在这个大小的表上,以下的SQL语句完全正常运行:</p> <pre class="brush:php;toolbar:false;">WITH cte_1 AS ( ( SELECT id_, match_date, p1_id AS player_id, p1_stat AS stat FROM test.match_table UNION ALL SELECT id_, match_date, p2_id AS player_id, p2_stat AS stat FROM test.match_table ) ), cte_2 AS ( SELECT id_, player_id, LAG(stat) OVER ( PARTITION BY player_id ORDER BY match_date, id_ ) AS prev_stat FROM cte_1 ) SELECT m.*, cte_p1.prev_stat AS p1_prev_stat, cte_p2.prev_stat AS p2_prev_stat FROM test.match_table AS m JOIN cte_2 AS cte_p1 ON cte_p1.id_ = m.id_ AND cte_p1.player_id = m.p1_id JOIN cte_2 AS cte_p2 ON cte_p2.id_ = m.id_ AND cte_p2.player_id = m.p2_id WHERE m.id_ = 11456 ORDER BY m.match_date</pre> <p>然而,实际的表格有130万行,大约需要12秒的时间。根据这个回答,问题似乎是CTE加载了所有或大部分的表格行,而不仅仅是需要的行。然而,解决方案并没有涵盖这种情况。</p><p>是否有人能提供一些建议,以改善性能?、</p><p>以下是创建小表格的SQL语句:</p><p><br /></p> <pre class="brush:php;toolbar:false;">CREATE TABLE `match_table` ( `id_` int NOT NULL AUTO_INCREMENT, `p1_id` int NOT NULL, `p2_id` int NOT NULL, `match_date` date NOT NULL, `p1_stat` int DEFAULT NULL, `p2_stat` int DEFAULT NULL, PRIMARY KEY (`id_`), KEY `ix__p1_id` (`p1_id`), KEY `ix__p2_id` (`p2_id`), KEY `ix__match_date` (`match_date`), KEY `ix__comp` (`p1_id`, `p2_id`, `match_date`) ); INSERT INTO `match_table` VALUES ( 4760, 2, 1, '1998-05-25', 1190, 1486 ), (6713, 2, 1, '1999-01-18', 2084, 885), ( 9365, 2, 1, '1999-11-01', 2894, 2040 ), ( 11456, 1, 2, '2000-05-15', 2358, 1491 ), ( 13022, 1, 2, '2000-08-14', 2722, 2401 ), ( 29159, 1, 2, '2002-08-26', 431, 2769 ), ( 44915, 1, 2, '2002-10-07', 1904, 482 ), ( 852666, 1, 2, '1997-01-01', 1301, 249 ), ( 852842, 1, 2, '1997-01-13', 2837, 2441 ), ( 853471, 2, 1, '1997-05-05', 1474, 952 );</pre>
P粉517814372P粉517814372498 天前386

全部回复(1)我来回复

  • P粉274161593

    P粉2741615932023-07-25 13:31:43

    首先,确保你的数据库上有正确的索引是关键。你应该确保id_、p1_id、p2_id和match_date这些字段都建立了索引,因为它们在你的查询中被使用到。

    其次,不要使用两个CTE(公共表达式),而是直接在主查询中使用LAG()函数。这样可以避免重复行的需求,从而加快查询速度。以下是你可以使用的方法:

    SELECT 
        m.*, 
        LAG(m.p1_stat) OVER (PARTITION BY m.p1_id ORDER BY m.match_date, m.id_) AS p1_prev_stat, 
        LAG(m.p2_stat) OVER (PARTITION BY m.p2_id ORDER BY m.match_date, m.id_) AS p2_prev_stat 
    FROM 
        test.match_table AS m 
    WHERE 
        m.id_ = 11456
    ORDER BY 
        m.match_date;

    这个查询执行以下操作:

    • 移除了union操作,这样可以避免行数翻倍的情况发生。
    • 跳过了两次连接原始表的步骤。
    • 在主查询中直接应用LAG()函数来获取先前的统计数据。

    然而,如果这种方法无法满足您的性能需求,可以考虑创建一个汇总表。该表会跟踪每个球员在每场比赛中的统计数据。每当添加新的比赛结果时,更新汇总表。这会占用更多的存储空间并影响写入性能,但极大地提高了读取性能。

    回复
    0
  • 取消回复