允许在HAVING子句中使用别名的性能影响

今天早些时候我在这个问题上愚弄了自己。问题是使用SQL

Server,正确的答案涉及添加HAVING子句。我最初犯的错误是认为该SELECT语句中可以使用该语句中的别名HAVING,这在SQL

Server中是不允许的。我犯此错误是因为我假设SQL Server与MySQL具有相同的规则,该规则允许在HAVING子句中使用别名。

这让我感到好奇,然后我在Stack Overflow和其他地方四处闲逛,找到了大量材料来解释为什么在两个各自的RDBMS上强制执行这些 规则

。但是,我没有找到关于在子句中允许/禁止别名的 性能 含义的解释HAVING

举一个具体的例子,我将重复上述问题中出现的查询:

SELECT students.camID, campus.camName, COUNT(students.stuID) as studentCount

FROM students

JOIN campus

ON campus.camID = students.camID

GROUP BY students.camID, campus.camName

HAVING COUNT(students.stuID) > 3

ORDER BY studentCount

HAVING子句中使用别名而不是重新指定COUNT?的性能含义是什么?这个问题可以在MySQL中直接回答,希望有人可以深入了解如果在HAVING子句中支持别名,那么在SQL中会发生什么。

在极少数情况下,可以同时使用MySQL和SQL Server标记SQL问题,因此请在阳光下享受这一刻。

回答:

仅仅专注于特定查询,并在下面加载示例数据。这确实解决了其他一些查询,例如count(distinct ...)其他人提到的查询。

这使用一个预先存在的表,其中大约有500万行是通过我的回答快速创建的,耗时3至5分钟。

结果结构:

CREATE TABLE `ratings` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`thing` int(11) NOT NULL,

PRIMARY KEY (`id`)

) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;

但是使用INNODB代替。由于范围保留插入而创建预期的INNODB间隙异常。只是说而已,没有区别。470万行。

修改表以接近Tim的假定架构。

rename table ratings to students; -- not exactly instanteous (a COPY)

alter table students add column camId int; -- get it near Tim's schema

-- don't add the `camId` index yet

以下将花费一些时间。一次又一次地大块运行它,否则您的连接可能会超时。超时是由于update语句中有500万行没有LIMIT子句。注意,我们 确实

有一个LIMIT子句。

因此,我们将进行半百万次的行迭代。将列设置为1到20之间的随机数

update students set camId=floor(rand()*20+1) where camId is null limit 500000; -- well that took a while (no surprise)

继续运行以上内容,直到no camId为null。

我跑了10次(整个过程需要7到10分钟)

select camId,count(*) from students

group by camId order by 1 ;

1 235641

2 236060

3 236249

4 235736

5 236333

6 235540

7 235870

8 236815

9 235950

10 235594

11 236504

12 236483

13 235656

14 236264

15 236050

16 236176

17 236097

18 235239

19 235556

20 234779

select count(*) from students;

-- 4.7 Million rows

创建一个有用的索引(当然是在插入之后)。

create index `ix_stu_cam` on students(camId); -- takes 45 seconds

ANALYZE TABLE students; -- update the stats: http://dev.mysql.com/doc/refman/5.7/en/analyze-table.html

-- the above is fine, takes 1 second

创建校园表。

create table campus

( camID int auto_increment primary key,

camName varchar(100) not null

);

insert campus(camName) values

('one'),('2'),('3'),('4'),('5'),

('6'),('7'),('8'),('9'),('ten'),

('etc'),('etc'),('etc'),('etc'),('etc'),

('etc'),('etc'),('etc'),('etc'),('twenty');

-- ok 20 of them

运行两个查询:

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount 

FROM students

JOIN campus

ON campus.camID = students.camID

GROUP BY students.camID, campus.camName

HAVING COUNT(students.id) > 3

ORDER BY studentCount;

-- run it many many times, back to back, 5.50 seconds, 20 rows of output

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount 

FROM students

JOIN campus

ON campus.camID = students.camID

GROUP BY students.camID, campus.camName

HAVING studentCount > 3

ORDER BY studentCount;

-- run it many many times, back to back, 5.50 seconds, 20 rows of output

所以时代是一样的。每跑十遍。

两者的EXPLAIN输出相同

+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+

| 1 | SIMPLE | campus | ALL | PRIMARY | NULL | NULL | NULL | 20 | Using temporary; Using filesort |

| 1 | SIMPLE | students | ref | ix_stu_cam | ix_stu_cam | 5 | bigtest.campus.camID | 123766 | Using index |

+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+

使用AVG()函数,通过以下两个查询having(具有相同的EXPLAIN输出)中的别名,我的性能提高了约12%。

SELECT students.camID, campus.camName, avg(students.id) as studentAvg 

FROM students

JOIN campus

ON campus.camID = students.camID

GROUP BY students.camID, campus.camName

HAVING avg(students.id) > 2200000

ORDER BY students.camID;

-- avg time 7.5

explain

SELECT students.camID, campus.camName, avg(students.id) as studentAvg

FROM students

JOIN campus

ON campus.camID = students.camID

GROUP BY students.camID, campus.camName

HAVING studentAvg > 2200000

ORDER BY students.camID;

-- avg time 6.5

最后是DISTINCT

SELECT students.camID, count(distinct students.id) as studentDistinct 

FROM students

JOIN campus

ON campus.camID = students.camID

GROUP BY students.camID

HAVING count(distinct students.id) > 1000000

ORDER BY students.camID; -- 10.6 10.84 12.1 11.49 10.1 9.97 10.27 11.53 9.84 9.98

-- 9.9

SELECT students.camID, count(distinct students.id) as studentDistinct

FROM students

JOIN campus

ON campus.camID = students.camID

GROUP BY students.camID

HAVING studentDistinct > 1000000

ORDER BY students.camID; -- 6.81 6.55 6.75 6.31 7.11 6.36 6.55

-- 6.45

具有相同输出的别名持续运行

EXPLAIN。见下文。因此,相同的Explain输出已显示两次,但并不能得出相同的性能,但这只是一般的提示。

+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+

| 1 | SIMPLE | campus | index | PRIMARY | PRIMARY | 4 | NULL | 20 | Using index; Using temporary; Using filesort |

| 1 | SIMPLE | students | ref | ix_stu_cam | ix_stu_cam | 5 | bigtest.campus.camID | 123766 | Using index |

+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+

以上是 允许在HAVING子句中使用别名的性能影响 的全部内容, 来源链接: utcz.com/qa/433046.html

回到顶部