涉及范围时，高基数列在索引中排在首位？

Z时代
2024-01-10
分类：问答

CREATE TABLE `files` (
  `did` int(10) unsigned NOT NULL DEFAULT '0',
  `filename` varbinary(200) NOT NULL,
  `ext` varbinary(5) DEFAULT NULL,
  `fsize` double DEFAULT NULL,
  `filetime` datetime DEFAULT NULL,
  PRIMARY KEY (`did`,`filename`),
  KEY `fe` (`filetime`,`ext`),          -- This?
  KEY `ef` (`ext`,`filetime`)           -- or This?
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

表中有一百万行。文件时间大部分是不同的。ext值的数量有限。因此，filetime具有高基数而ext具有低得多的基数。

查询涉及ext和filetime：

WHERE ext = '...' AND filetime BETWEEN ... AND ...

这两个指数中哪个更好？又为什么呢

回答：

首先，让我们尝试FORCE INDEX选择ef或fe。时间太短而无法清楚地知道哪个更快，但是`EXPLAIN显示出一个不同：

首先强制范围filetime。（注意：顺序WHERE不受影响。）

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(fe)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
|  1 | SIMPLE      | files | range | fe            | fe   | 14      | NULL | 16684 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+

首先强制低基数ext：

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(ef)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|  1 | SIMPLE      | files | range | ef            | ef   | 14      | NULL |  538 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

显然，这种rows说法ef更好。但是，让我们检查一下Optimizer跟踪。输出相当庞大；我只展示有趣的部分。没有FORCE需要;

跟踪将显示两个选项，然后选择更好的。

... "potential_range_indices": [ ... { "index": "fe", "usable": true, "key_parts": [ "filetime", "ext", "did", "filename" ] }, { "index": "ef", "usable": true, "key_parts": [ "ext", "filetime", "did", "filename" ] } ],

…

"analyzing_range_alternatives": { "range_scan_alternatives": [ { "index": "fe", "ranges": [ "2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 16684, "cost": 20022, <-- Here's the critical number "chosen": true }, { "index": "ef", "ranges": [ "gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 538, "cost": 646.61, <-- Here's the critical number "chosen": true } ],

…

"attached_conditions_computation": [ { "access_type_changed": { "table": "`files`", "index": "ef", "old_type": "ref", "new_type": "range", "cause": "uses_more_keyparts" <-- Also interesting } }

使用fe（第一个范围列），可以使用范围，但它估计扫描了16684行以获得ext='gif'。

使用ef（低基数ext优先），它可以使用索引的两列并在BTree中更有效地向下钻取。然后，它发现估计有538行，所有这些行都对查询有用-

无需进一步过滤。

结论：

INDEX(filetime, ext) 仅使用第一列。

INDEX(ext, filetime) 使用两列。

不 考虑基数， 将涉及=测试的列放在索引的 第一位 。 __

查询计划不会超出第一个“范围”列。

“基数”与 复合索引和这种查询 无关。

（“使用索引条件”表示存储引擎（InnoDB）将使用索引中用于过滤的列之外的列。”）

以上是涉及范围时，高基数列在索引中排在首位？的全部内容，来源链接： utcz.com/qa/411611.html

涉及范围时，高基数列在索引中排在首位？

回答：

其他人也看了：