Home >Database >Mysql Tutorial >How Can I Efficiently Select Random Rows from a Large PostgreSQL Table?
Randomly selecting rows from large databases such as PostgreSQL can be a performance-intensive task. This article explores two common methods of achieving this goal efficiently and discusses their advantages and disadvantages.
<code class="language-sql">select * from table where random() < 0.01;</code>
This method randomly sorts the rows and then filters based on a threshold. However, it requires a full table scan and can be slow for large data sets.
<code class="language-sql">select * from table order by random() limit 1000;</code>
This method randomly sorts the rows and selects the top n rows. It performs better than the first method, but it has a limitation: it may not be able to select a random subset when there are too many rows in the row group.
For tables with a large number of rows (such as 500 million rows in your example), the following approach provides an optimized solution:
<code class="language-sql">WITH params AS ( SELECT 1 AS min_id, -- 最小ID(小于等于当前最小ID) 5100000 AS id_span -- 四舍五入。(max_id - min_id + buffer) ) SELECT * FROM ( SELECT p.min_id + trunc(random() * p.id_span)::integer AS id FROM params p , generate_series(1, 1100) g -- 1000 + buffer GROUP BY 1 -- 去除重复项 ) r JOIN big USING (id) LIMIT 1000; -- 去除多余项</code>
This query utilizes the index on the ID column for efficient retrieval. It generates a series of random numbers within the ID space, ensuring the IDs are unique, and joins the data with the main table to select the required number of rows.
Boundary query:
It is crucial that the table ID column has relatively few gaps to avoid the need for large buffers in random number generation.
Materialized view:
If you need to repeatedly access random data, consider creating materialized views to improve performance.
TABLESAMPLE SYSTEM for PostgreSQL 9.5:
This optimization technique introduced in PostgreSQL 9.5 allows fast sampling of a specified percentage of rows.
The above is the detailed content of How Can I Efficiently Select Random Rows from a Large PostgreSQL Table?. For more information, please follow other related articles on the PHP Chinese website!