更快的解决方案是:
- 找到第一次出现的重复项,
- 然后删除所有不是第一次重复出现的行。
这看起来如下所示:
DELETE FROM dups a USING (
SELECT MIN(ctid) as ctid, key
FROM dups
GROUP BY key HAVING COUNT(*) > 1
) b
WHERE a.key = b.key
AND a.ctid <> b.ctid
请注意,使用此解决方案您无法控制保留哪一行。
玩具示例
CREATE TABLE people (
name varchar(50) NOT NULL,
surname varchar(50) NOT NULL,
age integer NOT NULL
);
INSERT INTO people (name, surname, age) VALUES
('A.', 'Tom', 30),
('A.', 'Tom', 10),
('B.', 'Tom', 20),
('B', 'Chris', 20);
-- The inner command to find duplicates first occurences:
SELECT MIN(ctid) as ctid, name, surname
FROM people
GROUP BY (name, surname) HAVING COUNT(*) > 1;
DELETE FROM people a USING (
SELECT MIN(ctid) as ctid, name, surname
FROM people
GROUP BY (name, surname) HAVING COUNT(*) > 1
) b
WHERE a.name = b.name
AND a.surname = b.surname
AND a.ctid <> b.ctid;
SELECT * FROM people;
内部请求输出:
ctid |
name |
surname |
(0,1) |
A. |
Tom |
最终请求(删除后)输出:
name |
surname |
age |
A. |
Tom |
30 |
B. |
Tom |
20 |
B |
Chris |
20 |
查看 DB Fiddle 上的玩具示例 https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/8761