Home >Database >Mysql Tutorial >How to Efficiently Select Random Rows in PostgreSQL?
PostgreSQL efficient random row selection method
To select random rows in PostgreSQL, the best method depends on the size of the table, available indexes, and the level of randomness required.
For a very large table with 500 million rows and a numeric ID column (e.g. id):
Fastest method:
random()
functions to generate random IDs within the ID space. <code class="language-sql">WITH params AS ( SELECT 1 AS min_id, -- 最小id , 5100000 AS id_span -- 四舍五入。(max_id - min_id + buffer) ) SELECT * FROM ( SELECT p.min_id + trunc(random() * p.id_span)::integer AS id FROM params p , generate_series(1, 1100) g -- 1000 + buffer GROUP BY 1 -- 去除重复项 ) r JOIN big USING (id) LIMIT 1000; -- 去除多余项</code>
Improvement method:
random_pick
) to eliminate any gaps in the ID space. LIMIT
to satisfy constraints. <code class="language-sql">WITH RECURSIVE random_pick AS ( SELECT * FROM ( SELECT 1 + trunc(random() * 5100000)::int AS id FROM generate_series(1, 1030) -- 1000 + 百分之几 - 根据需要调整 LIMIT 1030 -- 查询规划器提示 ) r JOIN big b USING (id) -- 消除缺失 UNION -- 消除重复项 SELECT b.* FROM ( SELECT 1 + trunc(random() * 5100000)::int AS id FROM random_pick r -- 加上百分之三 - 根据需要调整 LIMIT 999 -- 小于1000,查询规划器提示 ) r JOIN big b USING (id) -- 消除缺失 ) TABLE random_pick LIMIT 1000; -- 实际限制</code>
General functions:
<code class="language-sql">CREATE OR REPLACE FUNCTION f_random_sample(_tbl_type anyelement , _id text = 'id' , _limit int = 1000 , _gaps real = 1.03) RETURNS SETOF anyelement LANGUAGE plpgsql VOLATILE ROWS 1000 AS $func$ DECLARE _tbl text := pg_typeof(_tbl_type)::text; _estimate int := (...); BEGIN RETURN QUERY EXECUTE format( $$ WITH RECURSIVE random_pick AS ( SELECT ... FROM ... ... ) TABLE random_pick LIMIT ; $$ , _tbl, _id ) USING (...); END $func$;</code>
For scenarios that don’t require precise randomness or repeated calls:
Materialized view:
TABLESAMPLE SYSTEM (n)
:
TABLESAMPLE SYSTEM (n)
provides a fast and inexact random sampling method. n
parameter represents the percentage of tables to be sampled. <code class="language-sql">SELECT * FROM big TABLESAMPLE SYSTEM ((1000 * 100) / 5100000.0);</code>
Other notes:
random()
functions in PostgreSQL are not cryptographically secure. The above is the detailed content of How to Efficiently Select Random Rows in PostgreSQL?. For more information, please follow other related articles on the PHP Chinese website!