涉及范围时,高基数列在索引中排在首位?
CREATE TABLE `files` ( `did` int(10) unsigned NOT NULL DEFAULT '0',
`filename` varbinary(200) NOT NULL,
`ext` varbinary(5) DEFAULT NULL,
`fsize` double DEFAULT NULL,
`filetime` datetime DEFAULT NULL,
PRIMARY KEY (`did`,`filename`),
KEY `fe` (`filetime`,`ext`), -- This?
KEY `ef` (`ext`,`filetime`) -- or This?
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
表中有一百万行。文件时间大部分是不同的。ext
值的数量有限。因此,filetime
具有高基数而ext
具有低得多的基数。
查询涉及ext
和filetime
:
WHERE ext = '...' AND filetime BETWEEN ... AND ...
这两个指数中哪个更好?又为什么呢
回答:
首先,让我们尝试FORCE INDEX
选择ef
或fe
。时间太短而无法清楚地知道哪个更快,但是`EXPLAIN显示出一个不同:
首先强制范围filetime
。(注意:顺序WHERE
不受影响。)
mysql> EXPLAIN SELECT COUNT(*), AVG(fsize) FROM files FORCE INDEX(fe)
WHERE ext = 'gif' AND filetime >= '2015-01-01'
AND filetime < '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
| 1 | SIMPLE | files | range | fe | fe | 14 | NULL | 16684 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
首先强制低基数ext
:
mysql> EXPLAIN SELECT COUNT(*), AVG(fsize) FROM files FORCE INDEX(ef)
WHERE ext = 'gif' AND filetime >= '2015-01-01'
AND filetime < '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| 1 | SIMPLE | files | range | ef | ef | 14 | NULL | 538 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
显然,这种rows
说法ef
更好。但是,让我们检查一下Optimizer跟踪。输出相当庞大;我只展示有趣的部分。没有FORCE
需要;
跟踪将显示两个选项,然后选择更好的。
... "potential_range_indices": [
...
{
"index": "fe",
"usable": true,
"key_parts": [
"filetime",
"ext",
"did",
"filename"
]
},
{
"index": "ef",
"usable": true,
"key_parts": [
"ext",
"filetime",
"did",
"filename"
]
}
],
…
"analyzing_range_alternatives": { "range_scan_alternatives": [
{
"index": "fe",
"ranges": [
"2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 16684,
"cost": 20022, <-- Here's the critical number
"chosen": true
},
{
"index": "ef",
"ranges": [
"gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 538,
"cost": 646.61, <-- Here's the critical number
"chosen": true
}
],
…
"attached_conditions_computation": [ {
"access_type_changed": {
"table": "`files`",
"index": "ef",
"old_type": "ref",
"new_type": "range",
"cause": "uses_more_keyparts" <-- Also interesting
}
}
使用fe
(第一个范围列),可以使用范围,但它估计扫描了16684行以获得ext='gif'
。
使用ef
(低基数ext
优先),它可以使用索引的两列并在BTree中更有效地向下钻取。然后,它发现估计有538行,所有这些行都对查询有用-
无需进一步过滤。
结论:
INDEX(filetime, ext)
仅使用第一列。INDEX(ext, filetime)
使用两列。- 不 考虑基数, 将涉及
=
测试的列放在索引的 第一位 。 __ - 查询计划不会超出第一个“范围”列。
- “基数”与 复合索引和这种查询 无关。
(“使用索引条件”表示存储引擎(InnoDB)将使用索引中用于过滤的列之外的列。”)
以上是 涉及范围时,高基数列在索引中排在首位? 的全部内容, 来源链接: utcz.com/qa/411611.html