Home >Database >Mysql Tutorial >How can I efficiently select random rows in PostgreSQL?

How can I efficiently select random rows in PostgreSQL?

Linda Hamilton
Linda HamiltonOriginal
2025-01-21 05:41:08482browse

How can I efficiently select random rows in PostgreSQL?

Efficient random row selection method for PostgreSQL

PostgreSQL provides a variety of methods for efficiently selecting random rows.

Method 1: Use Random() and Limit clause

This method uses the random() function and the LIMIT clause:

<code class="language-sql">SELECT *
FROM table
ORDER BY random()
LIMIT 1000;</code>

However, for large tables, this method may be slower as it requires a full table scan.

Method 2: Index-based method

This method uses the primary key index to optimize the query:

<code class="language-sql">WITH params AS (
   SELECT 1       AS min_id,          -- 最小ID (大于等于当前最小ID)
        , 5100000 AS id_span          -- 四舍五入 (max_id - min_id + 缓冲)
)
SELECT *
FROM  (
   SELECT p.min_id + trunc(random() * p.id_span)::integer AS id
   FROM   params p
        , generate_series(1, 1100) g  -- 1000 + 缓冲
   GROUP  BY 1                        -- 去除重复项
) r
JOIN   table USING (id)
LIMIT  1000;                          -- 去除多余项</code>

This method is faster than method one because it uses an index scan instead of a full table scan.

Method 3: Use recursive CTE

This method uses a recursive common table expression (CTE) to handle missing values ​​in the ID column:

<code class="language-sql">WITH RECURSIVE random_pick AS (
   SELECT *
   FROM  (
      SELECT 1 + trunc(random() * 5100000)::int AS id
      FROM   generate_series(1, 1030)  -- 1000 + 百分几 - 根据需要调整
      LIMIT  1030                      -- 查询规划器提示
      ) r
   JOIN   table b USING (id)             -- 去除缺失值

   UNION                               -- 去除重复项
   SELECT b.*
   FROM  (
      SELECT 1 + trunc(random() * 5100000)::int AS id
      FROM   random_pick r             -- 加上百分几 - 根据需要调整
      LIMIT  999                       -- 小于1000,查询规划器提示
      ) r
   JOIN   table b USING (id)             -- 去除缺失值
)
TABLE  random_pick
LIMIT  1000;  -- 实际限制</code>

Method 4: Use TABLESAMPLE SYSTEM (n)

PostgreSQL 9.5 introduced the TABLESAMPLE SYSTEM (n) syntax, where n is a percentage between 0 and 100:

<code class="language-sql">SELECT *
FROM big
TABLESAMPLE SYSTEM ((1000 * 100) / 5100000.0);</code>

This method is fast, but may not return truly random samples due to clustering effects.

Comparison and suggestions

If the table has few missing values ​​for the ID column and the primary key index is in place, Method two (index-based method) is the best choice as it provides the best speed and accuracy sex.

For tables with many missing values, please consider Method 3 (recursive CTE), which can effectively handle missing values.

Method one (random() and limit) has lower performance and should be used with smaller tables.

Method 4(TABLESAMPLE SYSTEM) is fast, but not as accurate as other methods. It can be used to make quick estimates on large tables.

The above is the detailed content of How can I efficiently select random rows in PostgreSQL?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn