MySQL数据库SQL语句优化原理专题（四）-睿象云平台

MySQL数据库SQL语句优化原理专题（四）

本站部分文章、图片属于网络上可搜索到的公开信息，均用于学习和交流用途，不能代表睿象云的观点、立场或意见。我们接受网民的监督，如发现任何违法内容或侵犯了您的权益，请第一时间联系小编邮箱jiasou666@gmail.com 处理。

MySQL数据库SQL语句优化原理专题（四）

简介

做开发和运维的朋友，对MySQL数据库的复合索引一定不会陌生，可是要创建一个合理的，并且高效的复合索引，你就得了解复合索引的运行原理，你的了解什么是最左前缀原则。

复合索引运行原理

图中的复合索引的字段顺序为（age，first_name，second_name），从图中可以看出，B+tree索引树中的根节点和枝节点的数据，都是age，而叶子节点则存储了age，first_name，second_name3个字段信息，并且存储方式也按照age，first_name，second_name三个字段排好序的，age先排好，再按照first_name排序，最后按照second_name排好。

举个例子，如果应用想好查询(age=1 and first_name='黄' and second_name='安'）的记录，复合索引是如何搜索的呢，首先会从根节点开始寻找age=1的所有叶子节点，然后在叶子节点内部，通过二分法匹配出所有的age=1 and first_name='黄' and second_name='安'的记录，并找到对应的主键ID，最后回表返回完整的记录。

最左匹配原则

对于复合索引，总是从索引的最左边的字段开始，接着往后，中间不能跳过。例如创建了多列索引(age，first_name，second_name)，会先匹配age字段，再匹配first_name字段，再匹配second_name字段的，中间不能跳过。MySQL数据库会一直向右匹配直到遇到范围查询(>、<、between、like)就停止匹配。

估计有朋友看到上面的话，有点懵，没事，下面用实际案例才解释最左匹配原则。

创建测试表和记录

创建测试表t_test3

Create Table: CREATE TABLE `t_test3` ( `id` int(11) NOT NULL, `age` int(11) DEFAULT NULL, `first_name` char(20) DEFAULT NULL, `second_name` char(20) DEFAULT NULL, `address` char(20) DEFAULT NULL, `intime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf81 row in set (0.20 sec)insert into t_test3(id,age,first_name,second_name,address) values(1,1,'李','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(2,1,'李','邦','朝阳2');insert into t_test3(id,age,first_name,second_name,address) values(3,1,'李','当','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(4,1,'李','安','朝阳我说的是');insert into t_test3(id,age,first_name,second_name,address) values(5,1,'黄','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(6,2,'黄','邦','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(7,2,'李','安','朝阳d');insert into t_test3(id,age,first_name,second_name,address) values(8,2,'黄','当','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(9,3,'李','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(10,3,'李','邦','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(11,4,'黄','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(12,4,'李','当','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(13,5,'黄','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(14,6,'黄','当','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(15,6,'李','邦','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(16,6,'李','邦','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(17,7,'黄','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(18,7,null,'安','朝阳');

创建3个测试索引

alter table t_test3 add index idx_t_test3_age(age);alter table t_test3 add index idx_t_test3_age_first_name(age,first_name);alter table t_test3 add index idx_t_test3_age_first_name_second_name(age,first_name,second_name);

测试大于号（>）

测试SQL语句的执行计划（age=1 and first_name>'李' and second_name='安'）

从执行计划可以看出，最终使用的索引是idx_t_test3_age_first_name，这就证明当MySQL数据库开始搜索age=1之后，紧接着搜索first_name>'李'，由于碰到了大于号（>），索引搜索就停止了。

测试大于号（<）

测试SQL语句的执行计划（age=1 and first_name<'李' and second_name='安'）

从执行计划可以看出，最终使用的索引是idx_t_test3_age_first_name，这就证明当MySQL数据库开始搜索age=1之后，紧接着搜索first_name<'李'，由于碰到了小于号（<），索引搜索就停止了。

测试like

测试SQL语句的执行计划（age=1 and first_name like '李%' and second_name='安'）

从执行计划可以看出，最终使用的索引是idx_t_test3_age_first_name，这就证明当MySQL数据库开始搜索age=1之后，紧接着搜索first_name like '李%'，由于碰到了like，索引搜索就停止了，不过like用法会比较特殊，如果like后面的字符串中，通配符放在首位，则不会走索引，如果不放在首位，则会走索引。

下面来测试一下，通配符放在首位的情况

从执行计划中可以看出，最终使用的索引是idx_t_test3_age。

上面就是复合索引运行原理。

复合索引列选择原则

1.复合索引的索引列，尽量选择区分度高的列作为索引，区分度的公式是 count(distinct col)/count(*)，表示字段不重复的比例，比例越大我们扫描的记录数越少，唯一键的区分度是 1，而一些状态、性别字段可能在大数据面前区分度就是 0。一般需要 join 的字段都要求区分度 0.1 以上，即平均 1 条扫描 10 条记录。

2.尽可能的减少复合索引列的个数存在在线上系统表，开发人员在（orderid，merid，orderdate）上创建了复合索引，可实际上orderid基本就不会重复，实际上只需要在orderid上创建索引，即可满足应用查询要求。