Home >Database >Mysql Tutorial >How to Efficiently Select Random Rows in PostgreSQL?

How to Efficiently Select Random Rows in PostgreSQL?

Susan Sarandon
Susan SarandonOriginal
2025-01-21 05:32:08757browse

How to Efficiently Select Random Rows in PostgreSQL?

PostgreSQL efficient random row selection method

To select random rows in PostgreSQL, the best method depends on the size of the table, available indexes, and the level of randomness required.

For a very large table with 500 million rows and a numeric ID column (e.g. id):

  • Fastest method:

    • Use CTE and random() functions to generate random IDs within the ID space.
    • Join the generated ID with the table using the id column.
    • Filter out duplicates and remove redundant IDs.
<code class="language-sql">WITH params AS (
   SELECT 1       AS min_id,           -- 最小id
        , 5100000 AS id_span          -- 四舍五入。(max_id - min_id + buffer)
)
SELECT *
FROM  (
   SELECT p.min_id + trunc(random() * p.id_span)::integer AS id
   FROM   params p
        , generate_series(1, 1100) g  -- 1000 + buffer
   GROUP  BY 1                        -- 去除重复项
) r
JOIN   big USING (id)
LIMIT  1000;                          -- 去除多余项</code>
  • Improvement method:

    • Use recursive CTE (random_pick) to eliminate any gaps in the ID space.
    • Merge recursive results to eliminate duplicates.
    • Apply externalLIMIT to satisfy constraints.
<code class="language-sql">WITH RECURSIVE random_pick AS (
   SELECT *
   FROM  (
      SELECT 1 + trunc(random() * 5100000)::int AS id
      FROM   generate_series(1, 1030)  -- 1000 + 百分之几 - 根据需要调整
      LIMIT  1030                      -- 查询规划器提示
      ) r
   JOIN   big b USING (id)             -- 消除缺失

   UNION                               -- 消除重复项
   SELECT b.*
   FROM  (
      SELECT 1 + trunc(random() * 5100000)::int AS id
      FROM   random_pick r             -- 加上百分之三 - 根据需要调整
      LIMIT  999                       -- 小于1000,查询规划器提示
      ) r
   JOIN   big b USING (id)             -- 消除缺失
   )
TABLE  random_pick
LIMIT  1000;  -- 实际限制</code>
  • General functions:

    • Wrap the above queries into a function so they can be reused for any table with unique integer columns.
<code class="language-sql">CREATE OR REPLACE FUNCTION f_random_sample(_tbl_type anyelement
                                         , _id text = 'id'
                                         , _limit int = 1000
                                         , _gaps real = 1.03)
  RETURNS SETOF anyelement
  LANGUAGE plpgsql VOLATILE ROWS 1000 AS
$func$
DECLARE
   _tbl text := pg_typeof(_tbl_type)::text;
   _estimate int := (...);
BEGIN
   RETURN QUERY EXECUTE format(
   $$
   WITH RECURSIVE random_pick AS (
      SELECT ...
      FROM  ...
     ...
   )
   TABLE  random_pick
   LIMIT  ;
   $$
 , _tbl, _id
   )
   USING (...);
END
$func$;</code>

For scenarios that don’t require precise randomness or repeated calls:

  • Materialized view:

    • Create a materialized view to store approximately randomly selected rows.
    • Refresh materialized views periodically.
  • TABLESAMPLE SYSTEM (n):

    • Introduced in PostgreSQL 9.5, TABLESAMPLE SYSTEM (n) provides a fast and inexact random sampling method.
    • The
    • n parameter represents the percentage of tables to be sampled.
<code class="language-sql">SELECT * FROM big TABLESAMPLE SYSTEM ((1000 * 100) / 5100000.0);</code>

Other notes:

  • For best performance, use an index on the ID column.
  • The random() functions in PostgreSQL are not cryptographically secure.
  • The proposed approach provides a high degree of randomness for most practical use cases.

The above is the detailed content of How to Efficiently Select Random Rows in PostgreSQL?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn