P粉7864325792023-08-25 11:56:11
UPD:2017-03-31,版本5.7.5 MySQL 預設啟用 ONLY_FULL_GROUP_BY 開關(因此,非確定性 GROUP BY 查詢已停用)。此外,他們更新了 GROUP BY 實現,即使禁用了開關,該解決方案也可能無法按預期工作。需要檢查一下。
Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about ##n1 ## of only IS NULL
comparisons.
I made my tests on a InnoDB table of
rows with 1182
groups. The table contains testresults for functional tests and has the (test_id, request_id)# # as the primary key. Thus,
test_id is a group and I was searching for the last
request_id for each
test_id.
Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence ##using indexPLA##. ##
我有幾個基於相同想法的其他解決方案:
if the underlying index is BTREE index (which is usually the case), the largest
(group_id, item_value)
group_id
, that is the first for each
如果我們讀取索引覆蓋的值,則按照索引的順序讀取值;
每個索引都隱含包含附加到該索引的主鍵列(即主鍵位於覆蓋索引中)。在下面的解決方案中,我直接對主鍵進行操作,在您的情況下,您只需要在結果中添加主鍵列。
在許多情況下,在子查詢中按所需的順序收集所需的行 ID 並將子查詢的結果連接到 ID 上要便宜得多。由於對於子查詢結果中的每一行,MySQL 將需要基於主鍵進行一次獲取,因此子查詢將首先放在連接中,並且行將按照子查詢中id 的順序輸出(如果我們省略顯式ORDER BY用於連接)# 這個速度快得令人難以置信,在我的 18M 行上大約需要 0.8 秒:
SELECT test_id, MAX(request_id) AS request_id FROM testresults GROUP BY test_id DESC;
如果您想將順序變更為 ASC,請將其放入子查詢中,僅傳回 ids 並將其用作子查詢來連接其餘列:
SELECT test_id, request_id
FROM (
SELECT test_id, MAX(request_id) AS request_id
FROM testresults
GROUP BY test_id DESC) as ids
ORDER BY test_id;
這對我的數據來說大約需要 1.2 秒。
這是另一個解決方案,對於我的表來說大約需要 19 秒:
SELECT test_id, request_id FROM testresults, (SELECT @group:=NULL) as init WHERE IF(IFNULL(@group, -1)=@group:=test_id, 0, 1) ORDER BY test_id DESC, request_id DESC
它也按降序返回測試。它要慢得多,因為它執行完整索引掃描,但它可以讓您了解如何為每個群組輸出 N 個最大行。 該查詢的缺點是查詢快取無法快取其結果。
P粉8484421852023-08-25 09:17:24
MySQL 8.0 現在支援視窗函數,例如幾乎所有流行的 SQL 實作。使用這個標準語法,我們可以寫出每組最大n個查詢:
WITH ranked_messages AS ( SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn FROM messages AS m ) SELECT * FROM ranked_messages WHERE rn = 1;
此方法和其他尋找的方法分組最大行數在 MySQL 手冊中進行了說明。
以下是我在2009年針對這個問題寫的原始答案:
我這樣寫解決方案:
SELECT m1.* FROM messages m1 LEFT JOIN messages m2 ON (m1.name = m2.name AND m1.id < m2.id) WHERE m2.id IS NULL;
關於效能,一種解決方案可能會更好,具體取決於資料的性質。因此,您應該測試這兩個查詢,並根據您的資料庫使用效能更好的查詢。
For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts
table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.
我將編寫一個查詢來尋找給定用戶 ID(我的)的最新帖子。
First using the technique shown by @Eric with the GROUP BY
in a subquery:
SELECT p1.postid FROM Posts p1 INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid FROM Posts pi GROUP BY pi.owneruserid) p2 ON (p1.postid = p2.maxpostid) WHERE p1.owneruserid = 20860; 1 row in set (1 min 17.89 sec)
Even the EXPLAIN
analysis takes over 16 seconds:
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | | | 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where | | 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ 3 rows in set (16.09 sec)
Now produce the same query result using my technique with LEFT JOIN
:
SELECT p1.postid FROM Posts p1 LEFT JOIN posts p2 ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid) WHERE p2.postid IS NULL AND p1.owneruserid = 20860; 1 row in set (0.28 sec)
The EXPLAIN
analysis shows that both tables are able to use their indexes:
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ | 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index | | 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ 2 rows in set (0.00 sec)
Here's the DDL for my Posts
table:
CREATE TABLE `posts` ( `PostId` bigint(20) unsigned NOT NULL auto_increment, `PostTypeId` bigint(20) unsigned NOT NULL, `AcceptedAnswerId` bigint(20) unsigned default NULL, `ParentId` bigint(20) unsigned default NULL, `CreationDate` datetime NOT NULL, `Score` int(11) NOT NULL default '0', `ViewCount` int(11) NOT NULL default '0', `Body` text NOT NULL, `OwnerUserId` bigint(20) unsigned NOT NULL, `OwnerDisplayName` varchar(40) default NULL, `LastEditorUserId` bigint(20) unsigned default NULL, `LastEditDate` datetime default NULL, `LastActivityDate` datetime default NULL, `Title` varchar(250) NOT NULL default '', `Tags` varchar(150) NOT NULL default '', `AnswerCount` int(11) NOT NULL default '0', `CommentCount` int(11) NOT NULL default '0', `FavoriteCount` int(11) NOT NULL default '0', `ClosedDate` datetime default NULL, PRIMARY KEY (`PostId`), UNIQUE KEY `PostId` (`PostId`), KEY `PostTypeId` (`PostTypeId`), KEY `AcceptedAnswerId` (`AcceptedAnswerId`), KEY `OwnerUserId` (`OwnerUserId`), KEY `LastEditorUserId` (`LastEditorUserId`), KEY `ParentId` (`ParentId`), CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`) ) ENGINE=InnoDB;
評論者請注意:如果您想要使用不同版本的 MySQL、不同的資料集或不同的表設計進行另一個基準測試,請隨意自己做。我已經展示了上面的技術。 Stack Overflow 在這裡向您展示如何進行軟體開發工作,而不是為您完成所有工作。