首先我们来解释一下膨胀
REFRESH MATERIALIZED CONCURRENTLY
实施于src/backend/commands/matview.c https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/matview.c;h=e1eb7c374b833402c20255db384529dc0b7cee34;hb=09e99ce86e3dfb4716618b1dda4074b45ba56a09#l547,评论很有启发性:
/*
* refresh_by_match_merge
*
* Refresh a materialized view with transactional semantics, while allowing
* concurrent reads.
*
* This is called after a new version of the data has been created in a
* temporary table. It performs a full outer join against the old version of
* the data, producing "diff" results. This join cannot work if there are any
* duplicated rows in either the old or new versions, in the sense that every
* column would compare as equal between the two rows. It does work correctly
* in the face of rows which have at least one NULL value, with all non-NULL
* columns equal. The behavior of NULLs on equality tests and on UNIQUE
* indexes turns out to be quite convenient here; the tests we need to make
* are consistent with default behavior. If there is at least one UNIQUE
* index on the materialized view, we have exactly the guarantee we need.
*
* The temporary table used to hold the diff results contains just the TID of
* the old record (if matched) and the ROW from the new table as a single
* column of complex record type (if matched).
*
* Once we have the diff table, we perform set-based DELETE and INSERT
* operations against the materialized view, and discard both temporary
* tables.
*
* Everything from the generation of the new data to applying the differences
* takes place under cover of an ExclusiveLock, since it seems as though we
* would want to prohibit not only concurrent REFRESH operations, but also
* incremental maintenance. It also doesn't seem reasonable or safe to allow
* SELECT FOR UPDATE or SELECT FOR SHARE on rows being updated or deleted by
* this command.
*/
因此,通过删除行并从临时表中插入新行来刷新物化视图。这当然会导致死元组和表膨胀,这已由您证实VACUUM (VERBOSE)
output.
在某种程度上,这就是你付出的代价CONCURRENTLY
.
其次,让我们来揭穿这个神话:VACUUM
无法删除死元组
VACUUM
将删除死行,但它不能减少膨胀(可以通过VACUUM (FULL)
,但这会锁定视图,就像REFRESH MATERIALIZED VIEW
没有CONCURRENTLY
).
我怀疑您用来确定死元组数量的查询只是一个估计,导致死元组数量错误。
一个例子可以证明这一切
CREATE TABLE tab AS SELECT id, 'row ' || id AS val FROM generate_series(1, 100000) AS id;
-- make sure autovacuum doesn't spoil our demonstration
CREATE MATERIALIZED VIEW tab_v WITH (autovacuum_enabled = off)
AS SELECT * FROM tab;
-- required for CONCURRENTLY
CREATE UNIQUE INDEX ON tab_v (id);
Use the pgstattuple
精确测量表膨胀的扩展:
CREATE EXTENSION pgstattuple;
SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len | 4431872
tuple_count | 100000
tuple_len | 3788895
tuple_percent | 85.49
dead_tuple_count | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 16724
free_percent | 0.38
现在让我们删除表中的一些行,刷新并再次测量:
DELETE FROM tab WHERE id BETWEEN 40001 AND 80000;
REFRESH MATERIALIZED VIEW CONCURRENTLY tab_v;
SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len | 4431872
tuple_count | 60000
tuple_len | 2268895
tuple_percent | 51.19
dead_tuple_count | 40000
dead_tuple_len | 1520000
dead_tuple_percent | 34.3
free_space | 16724
free_percent | 0.38
很多死元组。VACUUM
摆脱这些:
VACUUM tab_v;
SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len | 4431872
tuple_count | 60000
tuple_len | 2268895
tuple_percent | 51.19
dead_tuple_count | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 1616724
free_percent | 36.48
死的元组消失了,但现在有很多空闲空间。