我使用 PostgreSQL 9.1 来查询分层树结构数据,其中包含与节点连接的边(或元素)。这些数据实际上是针对流网络的,但我已将问题抽象为简单的数据类型。考虑这个例子tree
桌子。每条边都有长度和面积属性,用于确定网络中的一些有用的度量。
CREATE TEMP TABLE tree (
edge text PRIMARY KEY,
from_node integer UNIQUE NOT NULL, -- can also act as PK
to_node integer REFERENCES tree (from_node),
mode character varying(5), -- redundant, but illustrative
length numeric NOT NULL,
area numeric NOT NULL,
fwd_path text[], -- optional ordered sequence, useful for debugging
fwd_search_depth integer,
fwd_length numeric,
rev_path text[], -- optional unordered set, useful for debugging
rev_search_depth integer,
rev_length numeric,
rev_area numeric
);
CREATE INDEX ON tree (to_node);
INSERT INTO tree(edge, from_node, to_node, mode, length, area) VALUES
('A', 1, 4, 'start', 1.1, 0.9),
('B', 2, 4, 'start', 1.2, 1.3),
('C', 3, 5, 'start', 1.8, 2.4),
('D', 4, 5, NULL, 1.2, 1.3),
('E', 5, NULL, 'end', 1.1, 0.9);
如下图所示,A-E 代表的边与节点 1-5 连接。空值to_node
(Ø)代表结束节点。这from_node
总是独一无二的,所以可以充当PK。如果这个网络像一个流域 https://en.wikipedia.org/wiki/Drainage_basin,从上到下,起始支流边为A、B、C,结束流出边为E。
The 的文档WITH http://www.postgresql.org/docs/9.1/static/queries-with.html提供了一个很好的示例,说明如何在递归查询中使用搜索图。因此,为了获取“向前”信息,查询从末尾开始,然后向后进行:
WITH RECURSIVE search_graph AS (
-- Begin at ending nodes
SELECT E.from_node, 1 AS search_depth, E.length
, ARRAY[E.edge] AS path -- optional
FROM tree E WHERE E.to_node IS NULL
UNION ALL
-- Accumulate each edge, working backwards (upstream)
SELECT o.from_node, sg.search_depth + 1, sg.length + o.length
, o.edge|| sg.path -- optional
FROM tree o, search_graph sg
WHERE o.to_node = sg.from_node
)
UPDATE tree SET
fwd_path = sg.path,
fwd_search_depth = sg.search_depth,
fwd_length = sg.length
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;
SELECT edge, from_node, to_node, fwd_path, fwd_search_depth, fwd_length
FROM tree ORDER BY edge;
edge | from_node | to_node | fwd_path | fwd_search_depth | fwd_length
------+-----------+---------+----------+------------------+------------
A | 1 | 4 | {A,D,E} | 3 | 3.4
B | 2 | 4 | {B,D,E} | 3 | 3.5
C | 3 | 5 | {C,E} | 2 | 2.9
D | 4 | 5 | {D,E} | 2 | 2.3
E | 5 | | {E} | 1 | 1.1
以上是有道理的,并且对于大型网络来说可以很好地扩展。例如,我可以看到边缘B
距末端 3 条边,前进路径为{B,D,E}
从尖端到末端的总长度为3.5。
但是,我无法找出构建反向查询的好方法。也就是说,从每条边开始,累积的“上游”边、长度和面积是多少。使用WITH RECURSIVE
,我拥有的最好的是:
WITH RECURSIVE search_graph AS (
-- Begin at starting nodes
SELECT S.from_node, S.to_node, 1 AS search_depth, S.length, S.area
, ARRAY[S.edge] AS path -- optional
FROM tree S WHERE from_node IN (
-- Starting nodes have a from_node without any to_node
SELECT from_node FROM tree EXCEPT SELECT to_node FROM tree)
UNION ALL
-- Accumulate edges, working forwards
SELECT c.from_node, c.to_node, sg.search_depth + 1, sg.length + c.length, sg.area + c.area
, c.edge || sg.path -- optional
FROM tree c, search_graph sg
WHERE c.from_node = sg.to_node
)
UPDATE tree SET
rev_path = sg.path,
rev_search_depth = sg.search_depth,
rev_length = sg.length,
rev_area = sg.area
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;
SELECT edge, from_node, to_node, rev_path, rev_search_depth, rev_length, rev_area
FROM tree ORDER BY edge;
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+----------+------------------+------------+----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {D,A} | 2 | 2.3 | 2.2
E | 5 | | {E,C} | 2 | 2.9 | 3.3
我想将聚合构建到递归查询的第二项中,因为每个下游边缘连接到 1 个或多个上游边缘,但是递归查询不允许使用聚合 https://wiki.postgresql.org/wiki/CTEReadme。另外,我知道连接很草率,因为with recursive
结果有多个连接条件edge
.
反向/向后查询的预期结果是:
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+-------------+------------------+------------+----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {A,B,D} | 3 | 3.5 | 3.5
E | 5 | | {A,B,C,D,E} | 5 | 6.4 | 6.8
我该如何编写这个更新查询?
请注意,我最终更关心累积准确的长度和面积总计,并且路径属性用于调试。在我的实际情况中,正向路径最多可达数百条,对于大型且复杂的流域,我预计反向路径可达数万条。