我要感谢 Jeff Breadner 提供的带有示例数据的 DDL。
您必须逐个 CTE 逐步运行以下查询并检查中间结果以了解其作用。它假设AcctNumber
在给定表中是唯一的。
首先,我想找到每个客户的最新帐户。这是一个简单的top-n-per-group
查询,我正在使用ROW_NUMBER
接近这里。
CTE_Customers
通过将所有个人客户放在一起制作一个简单的列表Cust1ID
and Cust2ID
. CTE_RN
为它们分配行号。CTE_LatestAccounts
提供每个客户的最新帐户:
+------------------+------------+--------+
| LatestAcctNumber | LatestDT | CustID |
+------------------+------------+--------+
| 10000 | 2016-02-01 | 1110 |
| 10050 | 2017-02-01 | 1111 |
| 10052 | 2017-02-02 | 1112 |
| 10052 | 2017-02-02 | 1113 |
| 7060 | 2017-02-04 | 1114 |
| 7060 | 2017-02-04 | 1115 |
| 10004 | 2016-02-02 | 1116 |
| 10067 | 2017-02-05 | 1117 |
| 10054 | 2017-02-03 | 1118 |
| 10101 | 2017-06-02 | 1119 |
| 10058 | 2017-02-03 | 1120 |
| 10058 | 2017-02-03 | 1121 |
| 10007 | 2016-02-01 | 1122 |
| 10057 | 2017-02-03 | 1123 |
| 10107 | 2017-06-02 | 1124 |
| 10107 | 2017-06-02 | 1125 |
+------------------+------------+--------+
由于拥有将最新帐户“传播”到另一个客户的客户对,该任务变得复杂。
客户对是在原始表中定义的,因此CTE_MaxLatestAccounts
从原始表中获取每一行并将最新帐户连接到其中两次 - forCust1D
and Cust2ID
。对于每一对,我都会选择两个最新帐户之一 - 最新的一个。因此,属于一对的客户可以从其合作伙伴那里获得一个帐户。
+---------+---------+-------------+---------------------+
| Cust1ID | Cust2ID | MaxLatestDT | MaxLatestAcctNumber |
+---------+---------+-------------+---------------------+
| 1110 | NULL | 2016-02-01 | 10000 |
| 1111 | NULL | 2017-02-01 | 10050 |
| 1111 | NULL | 2017-02-01 | 10050 |
| 1120 | NULL | 2017-02-03 | 10058 |
| 1120 | 1121 | 2017-02-03 | 10058 |
| 1112 | NULL | 2017-02-02 | 10052 |
| 1113 | 1112 | 2017-02-02 | 10052 |
| 1114 | 1115 | 2017-02-04 | 7060 |
| 1115 | 1114 | 2017-02-04 | 7060 |
| 1116 | 1117 | 2017-02-05 | 10067 |
| 1117 | NULL | 2017-02-05 | 10067 |
| 1118 | NULL | 2017-02-03 | 10054 |
| 1118 | 1119 | 2017-06-02 | 10101 |
| 1119 | NULL | 2017-06-02 | 10101 |
| 1122 | 1123 | 2017-02-03 | 10057 |
| 1123 | 1124 | 2017-06-02 | 10107 |
| 1124 | 1125 | 2017-06-02 | 10107 |
+---------+---------+-------------+---------------------+
The MaxLatestAcctNumber
这里适用于两者Cust1ID
and Cust2ID
。同一客户可能会在此处列出多次,我们需要使用最新的帐户再次选择条目。这是一对的最新帐户,不是针对个人客户的。
方法与一开始相同。两者都放Cust1ID
and Cust2ID
列表中的客户:CTE_CustomersWithLatestAccountFromPair
。分配行号CTE_CustomersWithLatestAccountFromPairRN
并选择最终帐户CTE_FinalAccounts
.
+---------------------+
| MaxLatestAcctNumber |
+---------------------+
| 10000 |
| 10050 |
| 10052 |
| 10052 |
| 7060 |
| 7060 |
| 10067 |
| 10067 |
| 10101 |
| 10101 |
| 10058 |
| 10058 |
| 10057 |
| 10107 |
| 10107 |
| 10107 |
+---------------------+
现在我们只需要过滤原始表并只保留出现在该列表中的那些行(帐户)。请参阅下面的最终结果。
样本数据
declare @ACCT table (
AcctNumber int,
dt date,
Cust1ID int,
Cust2ID int
);
insert into @ACCT values
(10000, '2016-02-01', 1110, null),
(10001, '2016-02-01', 1111, null),
(10050, '2017-02-01', 1111, null),
(10008, '2016-02-01', 1120, null),
(10058, '2017-02-03', 1120, 1121),
(10002, '2016-02-01', 1112, null),
(10052, '2017-02-02', 1113, 1112),
(10003, '2016-02-02', 1114, 1115),
(7060, '2017-02-04', 1115, 1114),
(10004, '2016-02-02', 1116, 1117),
(10067, '2017-02-05', 1117, null),
(10005, '2016-02-01', 1118, null),
(10054, '2017-02-03', 1118, 1119),
(10101, '2017-06-02', 1119, null),
(10007, '2016-02-01', 1122, 1123),
(10057, '2017-02-03', 1123, 1124),
(10107, '2017-06-02', 1124, 1125);
Query
WITH
CTE_Customers
AS
(
SELECT
AcctNumber
,dt
,Cust1ID AS CustID
FROM @ACCT
WHERE Cust1ID IS NOT NULL
UNION ALL
SELECT
AcctNumber
,dt
,Cust2ID AS CustID
FROM @ACCT
WHERE Cust2ID IS NOT NULL
)
,CTE_RN
AS
(
SELECT
AcctNumber
,dt
,CustID
,ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY dt DESC) AS rn
FROM CTE_Customers
)
,CTE_LatestAccounts
-- this gives one row per CustID
AS
(
SELECT
AcctNumber AS LatestAcctNumber
,dt AS LatestDT
,CustID
FROM CTE_RN
WHERE rn = 1
)
,CTE_MaxLatestAccounts
AS
(
SELECT
A.Cust1ID
,A.Cust2ID
,CASE WHEN ISNULL(A1.LatestDT, '2000-01-01') > ISNULL(A2.LatestDT, '2000-01-01')
THEN A1.LatestDT ELSE A2.LatestDT END AS MaxLatestDT
,CASE WHEN ISNULL(A1.LatestDT, '2000-01-01') > ISNULL(A2.LatestDT, '2000-01-01')
THEN A1.LatestAcctNumber ELSE A2.LatestAcctNumber END AS MaxLatestAcctNumber
FROM
@ACCT AS A
LEFT JOIN CTE_LatestAccounts AS A1 ON A1.CustID = A.Cust1ID
LEFT JOIN CTE_LatestAccounts AS A2 ON A2.CustID = A.Cust2ID
)
,CTE_CustomersWithLatestAccountFromPair
AS
(
SELECT
Cust1ID AS CustID
,MaxLatestDT
,MaxLatestAcctNumber
FROM CTE_MaxLatestAccounts
WHERE Cust1ID IS NOT NULL
UNION ALL
SELECT
Cust2ID AS CustID
,MaxLatestDT
,MaxLatestAcctNumber
FROM CTE_MaxLatestAccounts
WHERE Cust2ID IS NOT NULL
)
,CTE_CustomersWithLatestAccountFromPairRN
AS
(
SELECT
CustID
,MaxLatestDT
,MaxLatestAcctNumber
,ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY MaxLatestDT DESC) AS rn
FROM CTE_CustomersWithLatestAccountFromPair
)
,CTE_FinalAccounts
AS
(
SELECT MaxLatestAcctNumber
FROM CTE_CustomersWithLatestAccountFromPairRN
WHERE rn = 1
)
SELECT *
FROM @ACCT AS A
WHERE A.AcctNumber IN (SELECT MaxLatestAcctNumber FROM CTE_FinalAccounts)
;
Result
+------------+------------+---------+---------+
| AcctNumber | dt | Cust1ID | Cust2ID |
+------------+------------+---------+---------+
| 10000 | 2016-02-01 | 1110 | NULL |
| 10050 | 2017-02-01 | 1111 | NULL |
| 10058 | 2017-02-03 | 1120 | 1121 |
| 10052 | 2017-02-02 | 1113 | 1112 |
| 7060 | 2017-02-04 | 1115 | 1114 |
| 10067 | 2017-02-05 | 1117 | NULL |
| 10101 | 2017-06-02 | 1119 | NULL |
| 10057 | 2017-02-03 | 1123 | 1124 |
| 10107 | 2017-06-02 | 1124 | 1125 |
+------------+------------+---------+---------+
此结果与您想要的结果相符,除了最后一种情况 7 之外。
我的查询不会尝试跟踪任意长度的链接客户链,并且仅限于一次处理一对。这就是为什么案例 7 结果不是一行的原因。
查询将始终选择具有最后日期的行/帐户(10107
)并且它还可以选择链中间的帐户。在本例中,它选择了一行10057
, not 10007
,因为这是客户稍后的帐户1122
and 1123
.
当我查看执行计划时,我看到后面的查询CTE_LatestAccounts
基本上运行了四次。
如果您保存结果,很可能CTE_LatestAccounts
放入具有适当索引的临时表中,整体性能会更好。
像这样的事情:
DECLARE @LatestAccounts TABLE
(LatestAcctNumber int, LatestDT date, CustID int PRIMARY KEY);
WITH
CTE_Customers
AS
(
SELECT
AcctNumber
,dt
,Cust1ID AS CustID
FROM @ACCT
WHERE Cust1ID IS NOT NULL
UNION ALL
SELECT
AcctNumber
,dt
,Cust2ID AS CustID
FROM @ACCT
WHERE Cust2ID IS NOT NULL
)
,CTE_RN
AS
(
SELECT
AcctNumber
,dt
,CustID
,ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY dt DESC) AS rn
FROM CTE_Customers
)
,CTE_LatestAccounts
-- this gives one row per CustID
AS
(
SELECT
AcctNumber AS LatestAcctNumber
,dt AS LatestDT
,CustID
FROM CTE_RN
WHERE rn = 1
)
INSERT INTO @LatestAccounts (LatestAcctNumber, LatestDT, CustID)
SELECT LatestAcctNumber, LatestDT, CustID
FROM CTE_LatestAccounts;
WITH
CTE_MaxLatestAccounts
AS
(
SELECT
A.Cust1ID
,A.Cust2ID
,CASE WHEN ISNULL(A1.LatestDT, '2000-01-01') > ISNULL(A2.LatestDT, '2000-01-01')
THEN A1.LatestDT ELSE A2.LatestDT END AS MaxLatestDT
,CASE WHEN ISNULL(A1.LatestDT, '2000-01-01') > ISNULL(A2.LatestDT, '2000-01-01')
THEN A1.LatestAcctNumber ELSE A2.LatestAcctNumber END AS MaxLatestAcctNumber
FROM
@ACCT AS A
LEFT JOIN @LatestAccounts AS A1 ON A1.CustID = A.Cust1ID
LEFT JOIN @LatestAccounts AS A2 ON A2.CustID = A.Cust2ID
)
,CTE_CustomersWithLatestAccountFromPair
AS
(
SELECT
Cust1ID AS CustID
,MaxLatestDT
,MaxLatestAcctNumber
FROM CTE_MaxLatestAccounts
WHERE Cust1ID IS NOT NULL
UNION ALL
SELECT
Cust2ID AS CustID
,MaxLatestDT
,MaxLatestAcctNumber
FROM CTE_MaxLatestAccounts
WHERE Cust2ID IS NOT NULL
)
,CTE_CustomersWithLatestAccountFromPairRN
AS
(
SELECT
CustID
,MaxLatestDT
,MaxLatestAcctNumber
,ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY MaxLatestDT DESC) AS rn
FROM CTE_CustomersWithLatestAccountFromPair
)
,CTE_FinalAccounts
AS
(
SELECT MaxLatestAcctNumber
FROM CTE_CustomersWithLatestAccountFromPairRN
WHERE rn = 1
)
SELECT *
FROM @ACCT AS A
WHERE A.AcctNumber IN (SELECT MaxLatestAcctNumber FROM CTE_FinalAccounts)
;
如果您确实需要在链长度任意时将所有链接的客户合并/分组到一行中,则可以使用递归查询来完成,如下所示,例如,此处:如何找到无向图的所有连通子图 https://stackoverflow.com/q/35254260/4116017
使用某个 GroupID 标记每个客户后,请查找每个客户的最新帐户,如该查询开头所示。然后找到该组中的最新帐户(而不是像此查询中那样的简单对)。
对于大型数据集来说,查找链接问题中无向图的所有子图的查询可能会非常慢,并且有有效的基于非集的算法可以做到这一点。
如果您知道链的最大长度不能超过某个数字,则可以使此递归查询更加高效。