您需要一种递归方法,但是您的WITH RECURSIVE
创建一个巨大的中间结果,这导致没有更多的线轴.
对于类似的过程,我使用了以下方法(最初在存储过程中使用 WHILE 循环):
CREATE MULTISET VOLATILE TABLE vt_tmp, NO Log AS
(
SELECT group_id, category_1, category_2,
-- assign a unique number to
Dense_Rank() Over (ORDER BY group_id, category_1) AS rnk
-- remove when you source data is unique
GROUP BY 1,2,3 -- same result as a DISTINCT, but processed before DENSE_RANK
FROM match_detail
)
WITH DATA
PRIMARY INDEX (category_2)
ON COMMIT PRESERVE ROWS;
现在重复以下更新,直到0 rows processed
:
-- find matching categories and assign them a common number
UPDATE vt_tmp FROM
( SELECT e2.group_id, e2.category_1, Min(e1.rnk) AS minrnk
FROM vt_tmp e1 JOIN vt_tmp e2
ON e1.category_2 = e2.category_2
AND e1.rnk < e2.rnk
GROUP BY e2.group_id, e2.category_1
) x
SET rnk = minrnk
WHERE
vt_tmp.group_id = x.group_id
AND vt_tmp.category_1 = x.category_1
;
要获取您最终需要的相关类别:
SELECT group_id, category_1 AS category, rnk AS related_categories
FROM vt_tmp
UNION
SELECT group_id, category_2, rnk
FROM vt_tmp
为了与您的预期结果完全匹配,您需要添加DENSE_RANK
:
SELECT group_id, category, Dense_Rank() Over (PARTITION BY group_id ORDER BY related_categories)
FROM
(
SELECT group_id, category_1 AS category, rnk AS related_categories
FROM vt_tmp
UNION
SELECT group_id, category_2, rnk
FROM vt_tmp
) AS dt