最近由于需要研究了一下MYSQL的随机抽取实现方法。举个例子,要从tablename表中随机提取一条记录,大家一般的写法就是: SELECT * FROM content ORDER BY RAND () LIMIT 1 【3万条记录查询花费 0.3745 秒(下同);从mysql slow query log看出“ORDER BY RAN
最近由于需要研究了一下MYSQL的随机抽取实现方法。举个例子,要从tablename表中随机提取一条记录,大家一般的写法就是:
<code><span>SELECT </span><span>*</span><span> FROM content ORDER BY RAND</span><span>()</span><span> LIMIT </span><span>1</span></code>
【3万条记录查询花费 0.3745 秒(下同);从mysql slow query log看出“ORDER BY RAND() ”全表扫描了2次!】
后来我查了一下MYSQL的官方手册,里面针对RAND()的提示大概意思就是,在ORDER BY从句里面不能使用RAND()函数,因为这样会导致数据列被多次扫描。但是在MYSQL 3.23版本中,仍然可以通过ORDER BY RAND()来实现随机。
但是真正测试一下才发现这样效率非常低。一个15万余条的库,查询5条数据,居然要8秒以上。查看官方手册,也说rand()放在ORDER BY 子句中会被执行多次,自然效率及很低。
搜索Google,采用JOIN,查询max(id) * rand()来随机获取数据。
<code><span>SELECT </span><span>*</span><span> FROM </span><span>`content`</span><span> AS t1 JOIN </span><span>(</span><span>SELECT ROUND</span><span>(</span><span>RAND</span><span>()</span><span>*</span><span>(</span><span>SELECT MAX</span><span>(</span><span>id</span><span>)</span><span> FROM </span><span>`content`</span><span>))</span><span> AS id</span><span>)</span><span> AS t2 WHERE t1</span><span>.</span><span>id </span><span>>=</span><span> t2</span><span>.</span><span>id ORDER BY t1</span><span>.</span><span>id ASC LIMIT </span><span>1</span><span>;</span></code>
【查询花费 0.0008 秒,飘易认为可以推荐使用这个语句!!】
但是这样会产生连续的5条记录。解决办法只能是每次查询一条,查询5次。即便如此也值得,因为15万条的表,查询只需要0.01秒不到。
有一个方法:
<code><span>SELECT </span><span>*</span><span> FROM </span><span>`content`</span><span> AS a JOIN </span><span>(</span><span> SELECT MAX</span><span>(</span><span> ID </span><span>)</span><span> AS ID FROM </span><span>`content`</span><span>)</span><span> AS b ON </span><span>(</span><span> a</span><span>.</span><span>ID </span><span>>=</span><span> FLOOR</span><span>(</span><span> b</span><span>.</span><span>ID </span><span>*</span><span> RAND</span><span>(</span><span>)</span><span>)</span><span>)</span><span> LIMIT </span><span>5</span><span>;</span></code>
上面这种方式保证了一定范围内的随机,查询花费 0.4265 秒,也不推荐。
下面的语句,mysql的论坛上有人使用
<code><span>SELECT </span><span>*</span><span> FROM </span><span>`content`</span><span> WHERE id </span><span>>=</span><span>(</span><span>SELECT FLOOR</span><span>(</span><span> MAX</span><span>(</span><span>id</span><span>)</span><span>*</span><span> RAND</span><span>())</span><span> FROM </span><span>`content`</span><span>)</span><span> ORDER BY id LIMIT </span><span>1</span><span>;</span></code>
【查询花费 1.2254 秒,飘易强烈不推荐!因为实测后,3万行的表,这个语句竟然会扫描500万行!!】
跟上面的语句还是有很大差距。总觉有什么地方不正常。于是我把语句改写了一下。
<code><strong><span><span>SELECT </span><span>*</span><span> FROM </span><span>`content`</span><span> WHERE id </span><span>>=</span><span>(</span><span>SELECT floor</span><span>(</span><span>RAND</span><span>()</span><span>*</span><span>(</span><span>SELECT MAX</span><span>(</span><span>id</span><span>)</span><span> FROM </span><span>`content`</span><span>)))</span><span> ORDER BY id LIMIT </span><span>1</span><span>;</span></span></strong></code>
【查询花费 0.0012 秒】
这下,效率又提高了,查询时间只有0.01秒
最后,再把语句完善一下,加上MIN(id)的判断。我在最开始测试的时候,就是因为没有加上MIN(id)的判断,结果有一半的时间总是查询到表中的前面几行。
完整查询语句是:
<code><span>SELECT </span><span>*</span><span> FROM </span><span>`content`</span><span> WHERE id </span><span>>=</span><span>(</span><span>SELECT floor</span><span>(</span><span> RAND</span><span>()</span><span>*</span><span>((</span><span>SELECT MAX</span><span>(</span><span>id</span><span>)</span><span> FROM </span><span>`content`</span><span>)-(</span><span>SELECT MIN</span><span>(</span><span>id</span><span>)</span><span> FROM </span><span>`content`</span><span>))</span><span>+</span><span>(</span><span>SELECT MIN</span><span>(</span><span>id</span><span>)</span><span> FROM </span><span>`content`</span><span>)))</span><span> ORDER BY id LIMIT </span><span>1</span><span>;</span></code>
【查询花费 0.0012 秒】
<code><span>SELECT </span><span>*</span><span> FROM </span><span>`content`</span><span> AS t1 JOIN </span><span>(</span><span>SELECT ROUND</span><span>(</span><span>RAND</span><span>()</span><span>*</span><span>((</span><span>SELECT MAX</span><span>(</span><span>id</span><span>)</span><span> FROM </span><span>`content`</span><span>)-(</span><span>SELECT MIN</span><span>(</span><span>id</span><span>)</span><span> FROM </span><span>`content`</span><span>))+(</span><span>SELECT MIN</span><span>(</span><span>id</span><span>)</span><span> FROM </span><span>`content`</span><span>))</span><span> AS id</span><span>)</span><span> AS t2 WHERE t1</span><span>.</span><span>id </span><span>>=</span><span> t2</span><span>.</span><span>id ORDER BY t1</span><span>.</span><span>id LIMIT </span><span>1</span><span>;</span></code>
【查询花费 0.0008 秒】
最后在php中对这两个语句进行分别查询10次,
前者花费时间 0.147433 秒
后者花费时间 0.015130 秒
看来采用JOIN的语法比直接在WHERE中使用函数效率还要高很多。(via)
======================================
【好了,最后飘易来总结下】:
第一种方案,即原始的 Order By Rand() 方法:
<code><span>$sql</span><span>=</span><span>"SELECT * FROM content ORDER BY rand() LIMIT 12"</span><span>;</span><span> $result</span><span>=</span><span>mysql_query</span><span>(</span><span>$sql</span><span>,</span><span>$conn</span><span>);</span><span> $n</span><span>=</span><span>1</span><span>;</span><span> $rnds</span><span>=</span><span>''</span><span>;</span><span>while</span><span>(</span><span>$row</span><span>=</span><span>mysql_fetch_array</span><span>(</span><span>$result</span><span>)){</span><span> $rnds</span><span>=</span><span>$rnds</span><span>.</span><span>$n</span><span>.</span><span>". <a href="show%22</span><span>.</span><span>%24row</span><span>%5B</span><span>" id><span>].</span><span>"-"</span><span>.</span><span>strtolower</span><span>(</span><span>trim</span><span>(</span><span>$row</span><span>[</span><span>'title'</span><span>])).</span><span>"'>"</span><span>.</span><span>$row</span><span>[</span><span>'title'</span><span>].</span><span>"</span></a><br>\n"</span><span>;</span><span> $n</span><span>++;</span><span>}</span></code>
3万条数据查12条随机记录,需要0.125秒,随着数据量的增大,效率越来越低。
第二种方案,改进后的 JOIN 方法:
<code><span>for</span><span>(</span><span>$n</span><span>=</span><span>1</span><span>;</span><span>$n</span><span><span>12</span><span>;</span><span>$n</span><span>++){</span><span> $sql</span><span>=</span><span>"SELECT * FROM `content` AS t1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(id) FROM `content`)) AS id) AS t2 WHERE t1.id >= t2.id ORDER BY t1.id ASC LIMIT 1"</span><span>;</span><span> $result</span><span>=</span><span>mysql_query</span><span>(</span><span>$sql</span><span>,</span><span>$conn</span><span>);</span><span> $yi</span><span>=</span><span>mysql_fetch_array</span><span>(</span><span>$result</span><span>);</span><span> $rnds </span><span>=</span><span> $rnds</span><span>.</span><span>$n</span><span>.</span><span>". <a href="show%22</span><span>.</span><span>%24yi</span><span>%5B</span><span>" id><span>].</span><span>"-"</span><span>.</span><span>strtolower</span><span>(</span><span>trim</span><span>(</span><span>$yi</span><span>[</span><span>'title'</span><span>])).</span><span>"'>"</span><span>.</span><span>$yi</span><span>[</span><span>'title'</span><span>].</span><span>"</span></a><br>\n"</span><span>;</span><span>}</span></span></code>
3万条数据查12条随机记录,需要0.004秒,效率大幅提升,比第一种方案提升了约30倍。缺点:多次select查询,IO开销大。
第三种方案,SQL语句先随机好ID序列,用 IN 查询(飘易推荐这个用法,IO开销小,速度最快):
<code><span>$sql</span><span>=</span><span>"SELECT MAX(id),MIN(id) FROM content"</span><span>;</span><span> $result</span><span>=</span><span>mysql_query</span><span>(</span><span>$sql</span><span>,</span><span>$conn</span><span>);</span><span> $yi</span><span>=</span><span>mysql_fetch_array</span><span>(</span><span>$result</span><span>);</span><span> $idmax</span><span>=</span><span>$yi</span><span>[</span><span>0</span><span>];</span><span> $idmin</span><span>=</span><span>$yi</span><span>[</span><span>1</span><span>];</span><span> $idlist</span><span>=</span><span>''</span><span>;</span><span>for</span><span>(</span><span>$i</span><span>=</span><span>1</span><span>;</span><span>$i</span><span><span>20</span><span>;</span><span>$i</span><span>++){</span><span>if</span><span>(</span><span>$i</span><span>==</span><span>1</span><span>){</span><span> $idlist</span><span>=</span><span>mt_rand</span><span>(</span><span>$idmin</span><span>,</span><span>$idmax</span><span>);</span><span>}</span><span>else</span><span>{</span><span> $idlist</span><span>=</span><span>$idlist</span><span>.</span><span>','</span><span>.</span><span>mt_rand</span><span>(</span><span>$idmin</span><span>,</span><span>$idmax</span><span>);</span><span>}</span><span>}</span><span> $idlist2</span><span>=</span><span>"id,"</span><span>.</span><span>$idlist</span><span>;</span><span> $sql</span><span>=</span><span>"select * from content where id in ($idlist) order by field($idlist2) LIMIT 0,12"</span><span>;</span><span> $result</span><span>=</span><span>mysql_query</span><span>(</span><span>$sql</span><span>,</span><span>$conn</span><span>);</span><span> $n</span><span>=</span><span>1</span><span>;</span><span> $rnds</span><span>=</span><span>''</span><span>;</span><span>while</span><span>(</span><span>$row</span><span>=</span><span>mysql_fetch_array</span><span>(</span><span>$result</span><span>)){</span><span> $rnds</span><span>=</span><span>$rnds</span><span>.</span><span>$n</span><span>.</span><span>". <a href="show%22</span><span>.</span><span>%24row</span><span>%5B</span><span>" id><span>].</span><span>"-"</span><span>.</span><span>strtolower</span><span>(</span><span>trim</span><span>(</span><span>$row</span><span>[</span><span>'title'</span><span>])).</span><span>"'>"</span><span>.</span><span>$row</span><span>[</span><span>'title'</span><span>].</span><span>"</span></a><br>\n"</span><span>;</span><span> $n</span><span>++;</span><span>}</span></span></code>
3万条数据查12条随机记录,需要0.001秒,效率比第二种方法又提升了4倍左右,比第一种方法提升120倍。注,这里使用了 order by
field($idlist2) 是为了不排序,否则 IN 是自动会排序的。缺点:有可能遇到ID被删除的情况,所以需要多选几个ID。
测试方法:
<code><span>$t </span><span>=</span><span> microtime</span><span>(</span><span>true</span><span>);</span><span>//执行语句</span><span> echo microtime</span><span>(</span><span>true</span><span>)</span><span>-</span><span> $t</span><span>;</span></code>
参考:
http://blog.csdn.net/zxl315/article/details/2435368
http://jan.kneschke.de/projects/mysql/order-by-rand/

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了关于架构原理的相关内容,MySQL Server架构自顶向下大致可以分网络连接层、服务层、存储引擎层和系统文件层,下面一起来看一下,希望对大家有帮助。

在mysql中,可以利用char()和REPLACE()函数来替换换行符;REPLACE()函数可以用新字符串替换列中的换行符,而换行符可使用“char(13)”来表示,语法为“replace(字段名,char(13),'新字符串') ”。

方法:1、利用right函数,语法为“update 表名 set 指定字段 = right(指定字段, length(指定字段)-1)...”;2、利用substring函数,语法为“select substring(指定字段,2)..”。

mysql的msi与zip版本的区别:1、zip包含的安装程序是一种主动安装,而msi包含的是被installer所用的安装文件以提交请求的方式安装;2、zip是一种数据压缩和文档存储的文件格式,msi是微软格式的安装包。

转换方法:1、利用cast函数,语法“select * from 表名 order by cast(字段名 as SIGNED)”;2、利用“select * from 表名 order by CONVERT(字段名,SIGNED)”语句。

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了关于MySQL复制技术的相关问题,包括了异步复制、半同步复制等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了mysql高级篇的一些问题,包括了索引是什么、索引底层实现等等问题,下面一起来看一下,希望对大家有帮助。

在mysql中,可以利用REGEXP运算符判断数据是否是数字类型,语法为“String REGEXP '[^0-9.]'”;该运算符是正则表达式的缩写,若数据字符中含有数字时,返回的结果是true,反之返回的结果是false。


熱AI工具

Undresser.AI Undress
人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover
用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool
免費脫衣圖片

Clothoff.io
AI脫衣器

AI Hentai Generator
免費產生 AI 無盡。

熱門文章

熱工具

Dreamweaver CS6
視覺化網頁開發工具

禪工作室 13.0.1
強大的PHP整合開發環境

EditPlus 中文破解版
體積小,語法高亮,不支援程式碼提示功能

SublimeText3 英文版
推薦:為Win版本,支援程式碼提示!

ZendStudio 13.5.1 Mac
強大的PHP整合開發環境