Home >Database >Mysql Tutorial >Why is my SELECT DISTINCT query slow on a Postgres table with a composite primary key, and how can I improve its performance?

Why is my SELECT DISTINCT query slow on a Postgres table with a composite primary key, and how can I improve its performance?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2025-01-07 18:33:41340browse

Why is my SELECT DISTINCT query slow on a Postgres table with a composite primary key, and how can I improve its performance?

Optimizing Slow SELECT DISTINCT Queries in Postgres

This article addresses performance issues encountered when executing SELECT DISTINCT queries on a large Postgres table with a composite primary key. A specific scenario involving a table with nearly two million rows and a composite primary key (product_id, trade_id) is examined. While a SELECT DISTINCT product_id query should ideally be fast due to the primary key index, unexpectedly slow performance was observed.

Root Cause Analysis:

The query planner's choice of a sequential scan, rather than utilizing the index, was identified as the bottleneck. This is attributed to the table's data distribution: only 40 unique product IDs exist, leading to a high degree of index value repetition. This results in numerous index probes and inefficient sequential access.

Effective Solution: Recursive CTE

To circumvent this limitation and leverage indexing efficiently, a recursive common table expression (CTE) is proposed as a superior alternative to SELECT DISTINCT:

<code class="language-sql">WITH RECURSIVE cte AS (
   (   -- parentheses required
   SELECT product_id
   FROM   tickers
   ORDER  BY 1
   LIMIT  1
   )
   UNION ALL
   SELECT l.*
   FROM   cte c
   CROSS  JOIN LATERAL (
      SELECT product_id
      FROM   tickers t
      WHERE  t.product_id > c.product_id  -- lateral reference
      ORDER  BY 1
      LIMIT  1
      ) l
   )
TABLE  cte;</code>

This recursive CTE effectively mimics an index skip scan. It iteratively retrieves distinct product_id values in sorted order, thereby avoiding the performance penalty associated with the inefficient sequential scan. The use of an index on the product_id column is crucial for optimal performance with this approach.

Important Note: While Postgres's index skip scan feature is under development, this CTE-based workaround offers a robust and efficient solution for the described scenario, significantly improving query performance.

The above is the detailed content of Why is my SELECT DISTINCT query slow on a Postgres table with a composite primary key, and how can I improve its performance?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn