涉及范围时,高基数列在索引中排在首位?

CREATE TABLE `files` (

`did` int(10) unsigned NOT NULL DEFAULT '0',

`filename` varbinary(200) NOT NULL,

`ext` varbinary(5) DEFAULT NULL,

`fsize` double DEFAULT NULL,

`filetime` datetime DEFAULT NULL,

PRIMARY KEY (`did`,`filename`),

KEY `fe` (`filetime`,`ext`), -- This?

KEY `ef` (`ext`,`filetime`) -- or This?

) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

表中有一百万行。文件时间大部分是不同的。ext值的数量有限。因此,filetime具有高基数而ext具有低得多的基数。

查询涉及extfiletime

WHERE ext = '...'

AND filetime BETWEEN ... AND ...

这两个指数中哪个更好?又为什么呢

回答:

首先,让我们尝试FORCE INDEX选择effe。时间太短而无法清楚地知道哪个更快,但是`EXPLAIN显示出一个不同:

首先强制范围filetime。(注意:顺序WHERE不受影响。)

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)

FROM files FORCE INDEX(fe)

WHERE ext = 'gif' AND filetime >= '2015-01-01'

AND filetime < '2015-01-01' + INTERVAL 1 MONTH;

+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+

| 1 | SIMPLE | files | range | fe | fe | 14 | NULL | 16684 | Using index condition |

+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+

首先强制低基数ext

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)

FROM files FORCE INDEX(ef)

WHERE ext = 'gif' AND filetime >= '2015-01-01'

AND filetime < '2015-01-01' + INTERVAL 1 MONTH;

+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

| 1 | SIMPLE | files | range | ef | ef | 14 | NULL | 538 | Using index condition |

+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

显然,这种rows说法ef更好。但是,让我们检查一下Optimizer跟踪。输出相当庞大;我只展示有趣的部分。没有FORCE需要;

跟踪将显示两个选项,然后选择更好的。

             ...

"potential_range_indices": [

...

{

"index": "fe",

"usable": true,

"key_parts": [

"filetime",

"ext",

"did",

"filename"

]

},

{

"index": "ef",

"usable": true,

"key_parts": [

"ext",

"filetime",

"did",

"filename"

]

}

],

              "analyzing_range_alternatives": {

"range_scan_alternatives": [

{

"index": "fe",

"ranges": [

"2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"

],

"index_dives_for_eq_ranges": true,

"rowid_ordered": false,

"using_mrr": false,

"index_only": false,

"rows": 16684,

"cost": 20022, <-- Here's the critical number

"chosen": true

},

{

"index": "ef",

"ranges": [

"gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"

],

"index_dives_for_eq_ranges": true,

"rowid_ordered": false,

"using_mrr": false,

"index_only": false,

"rows": 538,

"cost": 646.61, <-- Here's the critical number

"chosen": true

}

],

          "attached_conditions_computation": [

{

"access_type_changed": {

"table": "`files`",

"index": "ef",

"old_type": "ref",

"new_type": "range",

"cause": "uses_more_keyparts" <-- Also interesting

}

}

使用fe(第一个范围列),可以使用范围,但它估计扫描了16684行以获得ext='gif'

使用ef(低基数ext优先),它可以使用索引的两列并在BTree中更有效地向下钻取。然后,它发现估计有538行,所有这些行都对查询有用-

无需进一步过滤。

结论:

  • INDEX(filetime, ext) 仅使用第一列。
  • INDEX(ext, filetime) 使用两列。
  • 考虑基数, 将涉及=测试的列放在索引的 第一位 。 __
  • 查询计划不会超出第一个“范围”列。
  • “基数”与 复合索引和这种查询 无关。

(“使用索引条件”表示存储引擎(InnoDB)将使用索引中用于过滤的列之外的列。”)

以上是 涉及范围时,高基数列在索引中排在首位? 的全部内容, 来源链接: utcz.com/qa/411611.html

回到顶部