SQL 中的这些类型的树遍历问题可以使用特殊语法来解决WITH RECURSIVE https://www.postgresql.org/docs/current/static/queries-with.html.
为了测试该解决方案,让我们使用虚拟数据创建下表:
CREATE TABLE flight (
src TEXT
, dest TEXT
, stt TIMESTAMP
, endt TIMESTAMP);
INSERT INTO flight VALUES
('SIN', 'DAC', '2016-12-31 22:45:00', '2017-01-01 01:45:00'),
('DAC', 'SIN', '2017-01-01 16:30:00', '2017-01-01 21:30:00'),
('SIN', 'DAC', '2017-01-01 22:45:00', '2017-01-02 01:45:00'),
('DAC', 'DEL', '2017-01-01 10:00:00', '2017-01-01 11:30:00'),
('DEL', 'DAC', '2017-01-01 12:30:00', '2017-01-01 14:00:00'),
('DAC', 'CCU', '2017-01-01 10:30:00', '2017-01-01 11:15:00'),
('CCU', 'DAC', '2017-01-01 11:45:00', '2017-01-01 12:30:00'),
('SIN', 'DEL', '2017-01-01 11:00:00', '2017-01-01 15:00:00'),
('DEL', 'SIN', '2017-01-01 16:30:00', '2017-01-01 20:30:00'),
('CCU', 'DEL', '2017-01-01 12:00:00', '2017-01-01 12:45:00'),
('DEL', 'CCU', '2017-01-01 13:15:00', '2017-01-01 14:00:00');
递归 CTE 很难理解,所以我将一点一点地构建逻辑。
在递归 CTE 中,有两个部分。锚子查询和递归子查询。首先执行锚子查询,然后执行递归子查询,直到返回空结果集。棘手的部分(至少对我来说)是构造将执行从 1 个节点到下一个节点的遍历的连接。
锚点和递归子查询使用UNION ALL
(或者有时UNION
)并返回。
由于我们对航班路线感兴趣,因此首先使用一个简单的锚点子查询:
SELECT src, ARRAY[src] path, dest FROM flight
上面的查询有 3 列,分别是开始、路径和结束,在第二列中,我们将src
字段来自TEXT
to TEXT[]
。虽然这不是严格需要的,但它将大大简化其余步骤,因为数组很容易附加。
使用上面的锚查询,我们现在可以定义递归 cte。
WITH RECURSIVE flight_paths (src, path, dest) AS (
SELECT
src
, ARRAY[src]
, dest
FROM flight
UNION ALL
SELECT
fp.src
, fp.path || f.src -- appends another 'flight source'
, f.dest -- updates the dest to the new dest
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest
-- this is the join that links the tree nodes
WHERE NOT f.src = ANY(fp.path)
-- this condition prevents infinite recursion by not visiting nodes that have already been visited.
) SELECT * FROM flight_paths
-- finally, we can select from the flight_paths recursive cte
上面返回了 136 行以及我的测试数据。前 15 行如下所示:
src path dest
SIN {SIN} DAC
DAC {DAC} SIN
SIN {SIN} DAC
DAC {DAC} DEL
DEL {DEL} DAC
DAC {DAC} CCU
CCU {CCU} DAC
SIN {SIN} DEL
DEL {DEL} SIN
CCU {CCU} DEL
DEL {DEL} CCU
DEL {DEL,CCU} DAC
DAC {DAC,CCU} DAC
DEL {DEL,CCU} DEL
DAC {DAC,CCU} DEL
DEL {DEL,CCU} DEL
DAC {DAC,CCU} DEL
这部分,树遍历的设置,是编写递归 cte 中最难的部分。现在,遍历已经设置完毕,我们可以修改上面的内容:
- 我们从源头开始,到目的地结束
- 到达目的地时停止迭代
- 仅当到达(A-B)
对于这个例子,让src := SIN
& dest := CCU
WITH RECURSIVE flight_paths (src, path, dest, stt, endt) AS (
SELECT
src
, ARRAY[src]
, dest
, stt
, endt
FROM flight
UNION ALL
SELECT
fp.src
, fp.path || f.src
, f.dest
, fp.stt
, f.endt -- update endt to the new endt
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest
WHERE NOT f.src = ANY(fp.path)
AND NOT 'CCU' = ANY(fp.path)
-- (2) this new predicate stop iteration when the destination is reached
AND f.stt > fp.endt
-- (3) this new predicate only proceeds the iteration if the connecting flights are valid
)
SELECT *
FROM flight_paths
WHERE src = 'SIN' AND dest = 'CCU'
-- (1) specify the start & end
这给出了 2 条可能的路线,其中第一个航班的起飞时间为列stt
最后一班航班的到达时间为endt
:
src path dest stt endt
SIN {SIN,DAC} CCU 2016-12-31 22:45:00 2017-01-01 11:15:00
SIN {SIN,DAC,DEL} CCU 2016-12-31 22:45:00 2017-01-01 14:00:00
现在可以非常轻松地细化结果集。例如,最终查询可以修改为仅在中午之前到达目的地的返程航班,如下所示:
SELECT *
FROM flight_paths
WHERE src = 'SIN' AND dest = 'CCU'
AND endt::time < '12:00:00'::time
从现在开始,在递归 cte 中添加计算飞行时间和连接时间的列,然后筛选出符合这两个时间谓词的航班也相当容易。我希望我已经为您提供了足够的详细信息来生成这两个变体。
您还可以通过过滤长度来限制连接数path
数组,尽管一旦达到最大连接数,停止递归 cte 中的迭代可能会更有效。
最后一点:为了让事情变得简单,我之前使用了不包括最终目的地的路径,例如{SIN, DAC, DEL}
而不是航班顺序{SIN-DAC, DAC-DEL, DEL-CCU}
(或中途停留{DAC, DEL}
),但我们可以很容易地得到这些表示,如下所示:
WITH RECURSIVE flight_paths (src, flights, path, dest, stt, endt) AS (
SELECT
src
, ARRAY[src || '-' || dest]
, ARRAY[src]
, dest
, stt
, endt
FROM flight
UNION ALL
SELECT
fp.src
, fp.flights || (f.src || '-' || f.dest)
, fp.path || f.src
, f.dest
, fp.stt
, f.endt
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest
WHERE NOT f.src = ANY(fp.path)
AND NOT 'CCU' = ANY(fp.path)
AND f.stt > fp.endt
)
SELECT flights, stt, endt, path[2:] stopovers
FROM flight_paths
WHERE src = 'SIN' AND dest = 'CCU'