我想删除给定重复“external_id”的除一行之外的所有行。对于包含 5,000,000 行的表,运行下面的查询大约需要两分钟,我觉得必须有一种更快的方法来执行此任务。 “id”是主键,“external_id”是 btree 索引列:
delete from posts p1 using (select distinct on (1)
external_id, id
from posts
order by 1 desc, 2 desc) p_recent
where p1.external_id = p_recent.external_id
and p1.id != p_recent.id;
我怎样才能提高这个查询的性能?
编辑:查询计划如下:
Delete on posts p1 (cost=2322413.28..2673548.11 rows=5583248 width=45) (actual time=148064.026..148064.026 rows=0 loops=1)
-> Hash Join (cost=2322413.28..2673548.11 rows=5583248 width=45) (actual time=148064.025..148064.025 rows=0 loops=1)
Hash Cond: ((p_recent.external_id)::text = (p1.external_id)::text)
Join Filter: (p1.id <> p_recent.id)
-> Subquery Scan on p_recent (cost=1565918.17..1649666.91 rows=5583249 width=54) (actual time=80975.573..98202.920 rows=5947083 loops=1)
-> Unique (cost=1565918.17..1593834.42 rows=5583249 width=15) (actual time=80975.561..95891.264 rows=5947083 loops=1)
-> Sort (cost=1565918.17..1579876.30 rows=5583249 width=15) (actual time=80975.560..93768.105 rows=5947083 loops=1)
Sort Key: posts.external_id, posts.id
Sort Method: external merge Disk: 153984kB
-> Seq Scan on posts (cost=0.00..653989.49 rows=5583249 width=15) (actual time=0.014..10314.089 rows=5947083 loops=1)
-> Hash (cost=653989.49..653989.49 rows=5583249 width=21) (actual time=38966.573..38966.573 rows=5947083 loops=1)
Buckets: 4096 Batches: 256 Memory Usage: 1017kB
-> Seq Scan on posts p1 (cost=0.00..653989.49 rows=5583249 width=21) (actual time=0.028..35863.561 rows=5947083 loops=1)
Total runtime: 148084.796 ms
DELETE from posts del
WHERE EXISTS (
SELECT *
FROM posts ex
WHERE ex.external_id = del.external_id
AND ex.id < del.id -- if you want to keep the lowest id
-- AND ex.id > del.id -- if you want to keep the highest id
);
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)