检索每个组中的最后一条记录-MySQL

有一个messages包含数据的表,如下所示:

Id   Name   Other_Columns

-------------------------

1 A A_data_1

2 A A_data_2

3 A A_data_3

4 B B_data_1

5 B B_data_2

6 C C_data_1

如果我运行查询select * from messages group by name,我将得到的结果为:

1    A       A_data_1

4 B B_data_1

6 C C_data_1

什么查询将返回以下结果?

3    A       A_data_3

5 B B_data_2

6 C C_data_1

即,应返回每个组中的最后一条记录。

目前,这是我使用的查询:

SELECT

*

FROM (SELECT

*

FROM messages

ORDER BY id DESC) AS x

GROUP BY name

但这看起来效率很低。还有其他方法可以达到相同的结果吗?

回答:

MySQL 8.0现在支持窗口功能,就像几乎所有流行的SQL实现一样。使用这种标准语法,我们可以编写每组最多n个查询:

WITH ranked_messages AS (

SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn

FROM messages AS m

)

SELECT * FROM ranked_messages WHERE rn = 1;

以下是我在2009年为此问题写的原始答案:


我这样写解决方案:

SELECT m1.*

FROM messages m1 LEFT JOIN messages m2

ON (m1.name = m2.name AND m1.id < m2.id)

WHERE m2.id IS NULL;

关于性能,一种解决方案可能会更好,这取决于数据的性质。因此,您应该测试两个查询,并使用给定数据库性能最好的查询。

例如,我有一个StackOverflow

August数据转储的副本。我将其用于基准测试。该Posts表中有1,114,357行。它在Macbook

Pro 2.40GHz的MySQL 5.0.75上运行。

我将编写查询以查找给定用户ID(我的用户)的最新帖子。

SELECT p1.postid

FROM Posts p1

INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid

FROM Posts pi GROUP BY pi.owneruserid) p2

ON (p1.postid = p2.maxpostid)

WHERE p1.owneruserid = 20860;

1 row in set (1 min 17.89 sec)

甚至EXPLAIN分析也要花费超过16秒的时间:

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+

| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | |

| 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where |

| 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index |

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+

3 rows in set (16.09 sec)

SELECT p1.postid

FROM Posts p1 LEFT JOIN posts p2

ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)

WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

1 row in set (0.28 sec)

EXPLAIN分析表明,这两个表都能够使用他们的指标:

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+

| 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index |

| 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists |

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+

2 rows in set (0.00 sec)


这是我的Posts桌子的DDL :

CREATE TABLE `posts` (

`PostId` bigint(20) unsigned NOT NULL auto_increment,

`PostTypeId` bigint(20) unsigned NOT NULL,

`AcceptedAnswerId` bigint(20) unsigned default NULL,

`ParentId` bigint(20) unsigned default NULL,

`CreationDate` datetime NOT NULL,

`Score` int(11) NOT NULL default '0',

`ViewCount` int(11) NOT NULL default '0',

`Body` text NOT NULL,

`OwnerUserId` bigint(20) unsigned NOT NULL,

`OwnerDisplayName` varchar(40) default NULL,

`LastEditorUserId` bigint(20) unsigned default NULL,

`LastEditDate` datetime default NULL,

`LastActivityDate` datetime default NULL,

`Title` varchar(250) NOT NULL default '',

`Tags` varchar(150) NOT NULL default '',

`AnswerCount` int(11) NOT NULL default '0',

`CommentCount` int(11) NOT NULL default '0',

`FavoriteCount` int(11) NOT NULL default '0',

`ClosedDate` datetime default NULL,

PRIMARY KEY (`PostId`),

UNIQUE KEY `PostId` (`PostId`),

KEY `PostTypeId` (`PostTypeId`),

KEY `AcceptedAnswerId` (`AcceptedAnswerId`),

KEY `OwnerUserId` (`OwnerUserId`),

KEY `LastEditorUserId` (`LastEditorUserId`),

KEY `ParentId` (`ParentId`),

CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)

) ENGINE=InnoDB;

以上是 检索每个组中的最后一条记录-MySQL 的全部内容, 来源链接: utcz.com/qa/411794.html

回到顶部