Maison > Article > base de données > MYSQL索引
什么是索引? 举个例子:新华字典,有目录,有正文内容。索引就相当于目录,正文内容就相当于数据。 索引有什么用? 索引用于快速查找在某列中有一特定值的行。 一条查询语句,如果没有索引,将对全表进行扫描。 如果所有的数据页面都不在内存中,则需要从硬
举个例子:新华字典,有目录,有正文内容。索引就相当于目录,正文内容就相当于数据。
索引用于快速查找在某列中有一特定值的行。
一条查询语句,如果没有索引,将对全表进行扫描。
如果所有的数据页面都不在内存中,则需要从硬盘上读取这些页面,从而产生大量的I/O,每次I/O都会消耗一定时间。
最终,总的查询时间,会大的惊人。
若此时查询列有个索引,MYSQL 就能快速定位到具体位置,找出相关列,将指定数据页面读入内存,I/O 就会大大降低。
以字典为例,查找字母为 Z 开头的某个单词,先通过索引定位 Z 开头的单词的起始位置,从这里开始查询,从而节省了大量的时间。
一次查询只能使用一个索引。
like “%xxx” not in , != 对列进行函数运算的情况(如 where avg(age) = “20”)
explain select ...
假设,你有一个三列联合的索引:(col1, col2, col3)。
那么你将拥有三种索引使用方式:
(col1) (col1, col2) (col1, col2, col3)
上述说的就是最左前缀 – leftmost prefix。
So,当你有多列查询需求时,你可以考虑建一个合适的联合索引。
like 的参数不以非通配符 % 开头的字符常量,就能使用索引。
SELECT * FROM tbl_name WHERE key_col LIKE 'something%'; //匹配以something开头的字符串 SELECT * FROM tbl_name WHERE key_col LIKE '%something%'; //不使用索引 SELECT * FROM tbl_name WHERE key_col LIKE 'something'; //精确匹配,等效于 “ = ” 运算符
假如,你在看一本成语词典,目录是按成语拼音顺序建立。
查询需求是:你想找以 “一” 字开头的成语(“一%”),和你想找包含一字的成语(“%一%”)。
你觉得哪个会更快呢?
大多数情况下,索引都能大幅度提高查询效率。
数据的增、删、改操作都需要维护索引,索引一多,意味着维护成本高了。 更多的索引需要更多的存储空间。比如:20页的书,有15页的目录?这就不合理了。 小表建索引,往往适得其反。比如:读个2页的宣传手册,你还先去找目录?
更新非常频繁的列 列的值唯一性太小,比如性别,Enum 类型的字段等 太长的列 FROM:http://blog.segmentfault.com/vboy1010/1190000000461418
假设一高频查询如下
SELECT * FROM user WHERE area=’amoy’ AND sex=0 ORDER BY last_login DESC limit 30;
如何建立索引?描述考虑的过程
user表如下:
初始化100W条数据,其中,area要通过IP查询生成,sex为 0,1 随机
CREATE TABLE?user
?(
id
?int(10) NOT NULL AUTO_INCREMENT COMMENT ‘自增编号’,
username
?varchar(30) NOT NULL DEFAULT ’0′ COMMENT ‘用户名’,
password
?varchar(30) NOT NULL DEFAULT ’0′ COMMENT ‘密码’,
area
?varchar(30) NOT NULL COMMENT ‘地址’,
sex
?int(10) NOT NULL COMMENT ‘性别1,男;2,女。’,
last_login
?int(10) NOT NULL COMMENT ‘最近一次登录时间戳’,
PRIMARY KEY (id
)
) ENGINE=InnoDB AUTO_INCREMENT=892013 DEFAULT CHARSET=latin1
最终我的索引
(last_login,area)
user表
没有任何索引的查询相关日志:
SELECT * FROM user WHERE area=’美国ATT用户’ AND sex=0 ORDER BY last_login DESC limit 30; 0.57s
SELECT * FROM user WHERE area=’泰国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.56s
SELECT * FROM user WHERE area=’台湾省台湾大宽频’ AND sex=0 ORDER BY last_login DESC limit 30; 0.55s
SELECT * FROM user WHERE area=’美国弗吉尼亚州’ AND sex=0 ORDER BY last_login DESC limit 30; 0.59s
SELECT * FROM user WHERE area=’德国奔驰汽车’ AND sex=0 ORDER BY last_login DESC limit 30; 0.55s
SELECT * FROM user WHERE area=’台湾省中华电信’ AND sex=0 ORDER BY last_login DESC limit 30; 0.55s
SELECT * FROM user WHERE area=’韩国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.57s
SELECT * FROM user WHERE area=’拉美地区’ AND sex=0 ORDER BY last_login DESC limit 30; 0.58s
SELECT * FROM user WHERE area=’美国纽约(Prudential)’ AND sex=0 ORDER BY last_login DESC limit 30; 0.57s
SELECT * FROM user WHERE area=’印度尼西亚’ AND sex=0 ORDER BY last_login DESC limit 30; 0.57s
共花费时间:5.66s
建立索引area:
ALTER TABLE?user
?ADD INDEX?index_area
?(area
) ;
SELECT * FROM user WHERE area=’美国ATT用户’ AND sex=0 ORDER BY last_login DESC limit 30; 0.06s
SELECT * FROM user WHERE area=’泰国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.02s
SELECT * FROM user WHERE area=’台湾省台湾大宽频’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
SELECT * FROM user WHERE area=’美国弗吉尼亚州’ AND sex=0 ORDER BY last_login DESC limit 30; 0.10s
SELECT * FROM user WHERE area=’德国奔驰汽车’ AND sex=0 ORDER BY last_login DESC limit 30; 0.04s
SELECT * FROM user WHERE area=’台湾省中华电信’ AND sex=0 ORDER BY last_login DESC limit 30; 0.02s
SELECT * FROM user WHERE area=’韩国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.11s
SELECT * FROM user WHERE area=’拉美地区’ AND sex=0 ORDER BY last_login DESC limit 30; 0.20s
SELECT * FROM user WHERE area=’美国纽约(Prudential)’ AND sex=0 ORDER BY last_login DESC limit 30; 0.07s
SELECT * FROM user WHERE area=’印度尼西亚’ AND sex=0 ORDER BY last_login DESC limit 30; 0.04s
共花费时间:0.66s
可见,建立area以后对性能的影响是巨大的(5.66/0.66 约为8.5758倍)
删除索引:ALTER TABLE?user
?DROP INDEX?index_area
;
删除area索引发现时间又变成了0.57s
建立last_login索引:
SELECT * FROM user WHERE area=’美国ATT用户’ AND sex=0 ORDER BY last_login DESC limit 30; 0.03s
SELECT * FROM user WHERE area=’泰国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.09s
SELECT * FROM user WHERE area=’台湾省台湾大宽频’ AND sex=0 ORDER BY last_login DESC limit 30; 0.51s
SELECT * FROM user WHERE area=’美国弗吉尼亚州’ AND sex=0 ORDER BY last_login DESC limit 30; 0.01s
SELECT * FROM user WHERE area=’德国奔驰汽车’ AND sex=0 ORDER BY last_login DESC limit 30; 0.04s
SELECT * FROM user WHERE area=’台湾省中华电信’ AND sex=0 ORDER BY last_login DESC limit 30; 0.07s
SELECT * FROM user WHERE area=’韩国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.01s
SELECT * FROM user WHERE area=’拉美地区’ AND sex=0 ORDER BY last_login DESC limit 30; 0.01s
SELECT * FROM user WHERE area=’美国纽约(Prudential)’ AND sex=0 ORDER BY last_login DESC limit 30; 0.04s
SELECT * FROM user WHERE area=’印度尼西亚’ AND sex=0 ORDER BY last_login DESC limit 30; 0.06s
共花费时间:0.87s
同样能够提升性能(5.66/0.87 约为6.5057倍)
建立sex索引:
ALTER TABLE?user
?ADD INDEX?index_sex
?(sex
) ;
SELECT * FROM user WHERE area=’美国ATT用户’ AND sex=0 ORDER BY last_login DESC limit 30; 0.87s
SELECT * FROM user WHERE area=’泰国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.87s
SELECT * FROM user WHERE area=’台湾省台湾大宽频’ AND sex=0 ORDER BY last_login DESC limit 30; 0.87s
SELECT * FROM user WHERE area=’美国弗吉尼亚州’ AND sex=0 ORDER BY last_login DESC limit 30; 0.89s
SELECT * FROM user WHERE area=’德国奔驰汽车’ AND sex=0 ORDER BY last_login DESC limit 30; 0.88s
SELECT * FROM user WHERE area=’台湾省中华电信’ AND sex=0 ORDER BY last_login DESC limit 30; 0.87s
SELECT * FROM user WHERE area=’韩国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.86s
SELECT * FROM user WHERE area=’拉美地区’ AND sex=0 ORDER BY last_login DESC limit 30; 0.88s
SELECT * FROM user WHERE area=’美国纽约(Prudential)’ AND sex=0 ORDER BY last_login DESC limit 30; 0.87s
SELECT * FROM user WHERE area=’印度尼西亚’ AND sex=0 ORDER BY last_login DESC limit 30; 0.87s
共花费时间:8.73s
同样能够提升性能(5.66s/8.73 约为0.6483倍)效率反而降低了??求解?
建立这个sex索引还不如不建。
删除索引:
ALTER TABLE?user
?DROP INDEX?index_sex
;
发现时间又变成了0.57s左右,
建立两个单独的索引:
ALTER TABLE?user
ADD INDEX?index_area
?(area
) ,
ADD INDEX?index_last_login
?(last_login
) ;
SELECT * FROM user WHERE area=’美国ATT用户’ AND sex=0 ORDER BY last_login DESC limit 30; 0.09s
SELECT * FROM user WHERE area=’泰国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.33s
SELECT * FROM user WHERE area=’台湾省台湾大宽频’ AND sex=0 ORDER BY last_login DESC limit 30; 0.21s
SELECT * FROM user WHERE area=’美国弗吉尼亚州’ AND sex=0 ORDER BY last_login DESC limit 30; 0.01s
SELECT * FROM user WHERE area=’德国奔驰汽车’ AND sex=0 ORDER BY last_login DESC limit 30; 0.28s
SELECT * FROM user WHERE area=’台湾省中华电信’ AND sex=0 ORDER BY last_login DESC limit 30; 0.02s
SELECT * FROM user WHERE area=’韩国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.02s
SELECT * FROM user WHERE area=’拉美地区’ AND sex=0 ORDER BY last_login DESC limit 30; 0.02s
SELECT * FROM user WHERE area=’美国纽约(Prudential)’ AND sex=0 ORDER BY last_login DESC limit 30; 0.03s
SELECT * FROM user WHERE area=’印度尼西亚’ AND sex=0 ORDER BY last_login DESC limit 30; 0.67s
发现建立两个单独的索引还不如只建立一个索引
删除索引:
发现时间又变成了0.57s左右,
建立一个的联合索引:
ALTER TABLE?user
ADD INDEX?index_last_login_area
?(last_login
,area
) ,
SELECT * FROM user WHERE area=’美国ATT用户’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
SELECT * FROM user WHERE area=’泰国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
SELECT * FROM user WHERE area=’台湾省台湾大宽频’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
SELECT * FROM user WHERE area=’美国弗吉尼亚州’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
SELECT * FROM user WHERE area=’德国奔驰汽车’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
SELECT * FROM user WHERE area=’台湾省中华电信’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
SELECT * FROM user WHERE area=’韩国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
SELECT * FROM user WHERE area=’拉美地区’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
SELECT * FROM user WHERE area=’美国纽约(Prudential)’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
SELECT * FROM user WHERE area=’印度尼西亚’ AND sex=0 ORDER BY last_login DESC limit 30; 0.00s
额,第二条数据这是怎么了,我测试了5次都在这附近晃悠哈!
这尼玛,找对索引啦!就该这么建立,查询不出来需要的时间啦!估计就是我们需要的索引啦!!!!
删除索引:
发现时间又变成了0.57s左右,
建立一个的联合索引:
ALTER TABLE?user
ADD INDEX?index_sex_last_login_area
?(sex
,last_login
,area
)
SELECT * FROM user WHERE area=’美国ATT用户’ AND sex=0 ORDER BY last_login DESC limit 30; 0.18s
SELECT * FROM user WHERE area=’泰国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.17s
SELECT * FROM user WHERE area=’台湾省台湾大宽频’ AND sex=0 ORDER BY last_login DESC limit 30; 0.81s
SELECT * FROM user WHERE area=’美国弗吉尼亚州’ AND sex=0 ORDER BY last_login DESC limit 30; 0.01s
SELECT * FROM user WHERE area=’德国奔驰汽车’ AND sex=0 ORDER BY last_login DESC limit 30; 0.02s
SELECT * FROM user WHERE area=’台湾省中华电信’ AND sex=0 ORDER BY last_login DESC limit 30; 0.04s
SELECT * FROM user WHERE area=’韩国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.01s
SELECT * FROM user WHERE area=’拉美地区’ AND sex=0 ORDER BY last_login DESC limit 30; 0.01s
SELECT * FROM user WHERE area=’美国纽约(Prudential)’ AND sex=0 ORDER BY last_login DESC limit 30; 0.03s
SELECT * FROM user WHERE area=’印度尼西亚’ AND sex=0 ORDER BY last_login DESC limit 30; 0.04s
sex怎么总是你在拖后腿啊!把你调整到索引的最后一个吧!
删除索引:
发现时间又变成了0.57s左右,
建立一个的联合索引:
ALTER TABLE?user
ADD INDEX?index_last_login_area_sex
?(area
,last_login
,sex
)
SELECT * FROM user WHERE area=’美国ATT用户’ AND sex=0 ORDER BY last_login DESC limit 30; 0.03s
SELECT * FROM user WHERE area=’泰国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.07s
SELECT * FROM user WHERE area=’台湾省台湾大宽频’ AND sex=0 ORDER BY last_login DESC limit 30; 0.50s
SELECT * FROM user WHERE area=’美国弗吉尼亚州’ AND sex=0 ORDER BY last_login DESC limit 30; 0.02s
SELECT * FROM user WHERE area=’德国奔驰汽车’ AND sex=0 ORDER BY last_login DESC limit 30; 0.05s
SELECT * FROM user WHERE area=’台湾省中华电信’ AND sex=0 ORDER BY last_login DESC limit 30; 0.06s
SELECT * FROM user WHERE area=’韩国’ AND sex=0 ORDER BY last_login DESC limit 30; 0.02s
SELECT * FROM user WHERE area=’拉美地区’ AND sex=0 ORDER BY last_login DESC limit 30; 0.02s
SELECT * FROM user WHERE area=’美国纽约(Prudential)’ AND sex=0 ORDER BY last_login DESC limit 30; 0.04s
SELECT * FROM user WHERE area=’印度尼西亚’ AND sex=0 ORDER BY last_login DESC limit 30; 0.06s
综上所述:1.建立索引不一定能够加快查询效率如sex这种给重复次数特别多的列增加索引如sex这种会降低查询效率,具体的原因有待查找
2.给重复次数比较少的列增加u讴吟还是能够大幅度提高效率
3.给where和orderby之后的字段添加索引才会加快查询效率
4.为每一个列单独建立索引,不能将索引的效率最大化,应该使用索引合并策略,即根据查询条件,建立联合索引
5.联合索引的顺序问题:将选择性高的索引放到前面
6.根据资料建立索引意味着索引按照最左列进行排序,然后事第二列,以此类推。如(last_login ,area)就会按照last_login进行排序,然后才是area
7.根据这次的这个查询条件来说最好的索引是:ALTER TABLE?user
ADD INDEX?index_last_login_area
(last_login
,area
)。
本文出自:http://blog.chedushi.com, 原文地址:http://blog.chedushi.com/archives/7536, 感谢原作者分享。