速查1:eq_range_index_dive_limit决定的两种评估方式
对比 | Dive | Statistic | 真实 |
---|---|---|---|
idx1(t1,t2,k) | 5(误差0%) | 2(误差60%) | 5 |
Idx2(t1,k) | 1026(误差0%) | 985(误差%) | 1026(误差4%) |
statistics时长 | 0.188ms | 0.107ms |
速查2:内存限制
Equality Range Optimization of Many-Valued Comparisons
Consider these expressions, where col_name
is an indexed column:
col_name IN(val1, ..., valN)
col_name = val1 OR ... OR col_name = valN
Each expression is true if col_name
is equal to any of several values. These comparisons are equality range comparisons (where the “range” is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:
col_name
, the row estimate for each range is 1 because at most one row can have the given value.col_name
is nonunique and the optimizer can estimate the row count for each range using dives into the index or index statistics.With index dives, the optimizer makes a dive at each end of a range and uses the number of rows in the range as the estimate. For example, the expression col_name IN (10, 20, 30)
has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields an estimate of the number of rows that have the given value.
例如IN (10, 20, 30)
每个值做两次dive获得三个dive对,可以精确估计行数
Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.
在《高性能MySQL》里面提及用in这种方式可以有效的替代一定的range查询,提升查询效率,因为在一条索引里面,range字段后面的部分是不生效的(in后面的点查还能生效的,但是order by无效,原因就是组合)。使用in这种方式其实MySQL优化器是转化成了n*m种组合方式来进行查询,最终将返回值合并,有点类似union但是更高效。同时它存在这一些问题:
老版本的MySQL在IN()组合条件过多的时候会发生很多问题。查询优化可能需要花很多时间,并消耗大量内存。新版本MySQL在组合数超过一定的数量就不进行计划评估了,这可能导致MySQL不能很好的利用索引。
这里的“一定数量”在MySQL5.6.5以及以后的版本中是由eq_range_index_dive_limit这个参数控制,一直到5.7以后的版本默认会修改成200,当然我们是可以手动设置的。
参数使用
eq_range_index_dive_limit = 0 只能使用index dive
N >= eq_range_index_dive_limit 使用index statistics
N < eq_range_index_dive_limit 只能使用index dive
index dive与index statistics是MySQL优化器对开销代价的估算方法,前者统计速度慢但是能得到精准的值,后者统计速度快但是数据未必精准。
the optimizer can estimate the row count for each range using dives into the index or index statistics.
在MySQL5.7版本中将默认值从10修改成200目的是为了尽可能的保证范围等值运算(IN())执行计划尽量精准,因为IN()list的数量很多时候都是超过10的。
下面全部四种情况的查询会跳过dive
FORCE INDEX
index hint is present. The idea is that if index use is forced, there is nothing to be gained from the additional overhead of performing dives into the index.FULLTEXT
index.DISTINCT
, GROUP BY
, or ORDER BY
clause is present.Those dive-skipping conditions apply only for single-table queries. Index dives are not skipped for multiple-table queries (joins).
测试表
-- 1000万数据量
CREATE TABLE `sbtest1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`k` int(11) NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
`t1` int(11) NOT NULL,
`t2` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx1` (`t1`,`t2`,`k`),
KEY `idx2` (`t1`,`k`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4;
-- t1的选择性1W/1000W,t2的选择性1000/1000W
update sbtest1 set t1=FLOOR(RAND() * 10000);
update sbtest1 set t2=FLOOR(RAND() * 1000);
测试SQL
(选择走索引1点查或者走索引2排序)
-- KEY `idx1` (`t1`,`t2`,`k`)
-- KEY `idx2` (`t1`,`k`)
-- 选哪个?
-- 选了idx2避免排序
-- 0.00053350
explain select * from sbtest1 where t1=4949 and t2 in (7101, 384) order by k;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sbtest1
partitions: NULL
type: range
possible_keys: idx1,idx2
key: idx1
key_len: 8
ref: NULL
rows: 5
filtered: 100.00
Extra: Using index condition; Using filesort
-- 强制idx1走前两列索引
-- 0.00517950
explain select * from sbtest1 force index(idx2) where t1=4949 and t2 in (7101, 384) order by k;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sbtest1
partitions: NULL
type: ref
possible_keys: idx2
key: idx2
key_len: 4
ref: const
rows: 1026
filtered: 20.00
Extra: Using index condition; Using where
当前评估使用index dive
mysql> show variables like 'eq_range_index_dive_limit';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| eq_range_index_dive_limit | 200 |
+---------------------------+-------+
确认trace走(t1,t2,k),看到使用了index_dives_for_eq_ranges,评估的行数是很准确的,走idx1用5行,走idx2用1026行
set optimizer_trace=1;
set optimizer_trace_max_mem_size=16384;
select * from sbtest1 where t1=4949 and t2 in (7101, 384) order by k;
select * from information_schema.optimizer_trace;
"range_scan_alternatives": [
{
"index": "idx1",
"ranges": [
"4949 <= t1 <= 4949 AND 384 <= t2 <= 384",
"4949 <= t1 <= 4949 AND 7101 <= t2 <= 7101"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 5,
"cost": 8.01,
"chosen": true
},
{
"index": "idx2",
"ranges": [
"4949 <= t1 <= 4949"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 1026,
"cost": 1232.2,
"chosen": false,
"cause": "cost"
}
],
那么把eq_range_index_dive_limit改小会发生什么?可以看到评估明显不准了
set eq_range_index_dive_limit=1;
select * from sbtest1 where t1=4949 and t2 in (7101, 384) order by k;
select * from information_schema.optimizer_trace;
"range_scan_alternatives": [
{
"index": "idx1",
"ranges": [
"4949 <= t1 <= 4949 AND 384 <= t2 <= 384",
"4949 <= t1 <= 4949 AND 7101 <= t2 <= 7101"
],
"index_dives_for_eq_ranges": false,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 2,
"cost": 4.41,
"chosen": true
},
{
"index": "idx2",
"ranges": [
"4949 <= t1 <= 4949"
],
"index_dives_for_eq_ranges": false,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 985,
"cost": 1183,
"chosen": false,
"cause": "cost"
}
],
对比
对比 | Dive | Statistic | 真实 |
---|---|---|---|
idx1(t1,t2,k) | 5(误差0%) | 2(误差60%) | 5 |
Idx2(t1,k) | 1026(误差0%) | 985(误差%) | 1026(误差4%) |
statistics时长 | 0.188ms | 0.107ms |
statistic
show profiles;
show profile for query 32;
...
| statistics | 0.000107 |
dive
show profiles;
show profile for query 30;
...
| statistics | 0.000188 |
Limiting Memory Use for Range Optimization
Memory capacity of *N* bytes for 'range_optimizer_max_mem_size' exceeded. Range optimization was not done for this query.
For a simple query such as the following, where there is one candidate key for the range access method, each predicate combined with OR
uses approximately 230 bytes:
SELECT COUNT(*) FROM t
WHERE a=1 OR a=2 OR a=3 OR .. . a=N;
Before 5.7.11, the number of bytes per predicate combined with OR
was higher, approximately 700 bytes.
https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#row-constructor-range-optimization
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html