Home >Database >Mysql Tutorial >Why is my SELECT DISTINCT query slow on a Postgres table with a composite primary key, and how can I improve its performance?
Optimizing Slow SELECT DISTINCT Queries in Postgres
This article addresses performance issues encountered when executing SELECT DISTINCT
queries on a large Postgres table with a composite primary key. A specific scenario involving a table with nearly two million rows and a composite primary key (product_id, trade_id) is examined. While a SELECT DISTINCT product_id
query should ideally be fast due to the primary key index, unexpectedly slow performance was observed.
Root Cause Analysis:
The query planner's choice of a sequential scan, rather than utilizing the index, was identified as the bottleneck. This is attributed to the table's data distribution: only 40 unique product IDs exist, leading to a high degree of index value repetition. This results in numerous index probes and inefficient sequential access.
Effective Solution: Recursive CTE
To circumvent this limitation and leverage indexing efficiently, a recursive common table expression (CTE) is proposed as a superior alternative to SELECT DISTINCT
:
<code class="language-sql">WITH RECURSIVE cte AS ( ( -- parentheses required SELECT product_id FROM tickers ORDER BY 1 LIMIT 1 ) UNION ALL SELECT l.* FROM cte c CROSS JOIN LATERAL ( SELECT product_id FROM tickers t WHERE t.product_id > c.product_id -- lateral reference ORDER BY 1 LIMIT 1 ) l ) TABLE cte;</code>
This recursive CTE effectively mimics an index skip scan. It iteratively retrieves distinct product_id
values in sorted order, thereby avoiding the performance penalty associated with the inefficient sequential scan. The use of an index on the product_id
column is crucial for optimal performance with this approach.
Important Note: While Postgres's index skip scan feature is under development, this CTE-based workaround offers a robust and efficient solution for the described scenario, significantly improving query performance.
The above is the detailed content of Why is my SELECT DISTINCT query slow on a Postgres table with a composite primary key, and how can I improve its performance?. For more information, please follow other related articles on the PHP Chinese website!