我有一个大型数据集,我想对记录具有重叠时间的计数进行求和。例如,给定数据
[
{"id": 1, "name": 'A', "start": '2018-12-10 00:00:00', "end": '2018-12-20 00:00:00', count: 34},
{"id": 2, "name": 'B', "start": '2018-12-16 00:00:00', "end": '2018-12-27 00:00:00', count: 19},
{"id": 3, "name": 'C', "start": '2018-12-16 00:00:00', "end": '2018-12-20 00:00:00', count: 56},
{"id": 4, "name": 'D', "start": '2018-12-25 00:00:00', "end": '2018-12-30 00:00:00', count: 43}
]
您可以看到有 2 个活动重叠的时期。我想根据重叠涉及的活动返回这些“重叠”的总数。所以上面的输出会是这样的:
[
{start:'2018-12-16', end: '2018-12-20', overlap_ids:[1,2,3], total_count: 109},
{start:'2018-12-25', end: '2018-12-27', overlap_ids:[2,4], total_count: 62},
]
问题是,如何通过 postgres 查询生成这个?正在研究generate_series,然后计算出每个间隔内有哪些活动,但这不太正确,因为数据是连续的——我确实需要确定确切的重叠时间,然后对重叠活动进行求和。
EDIT Have added another example. As @SRack pointed out, since A,B,C overlap, this means B,C A,B and A,C also overlap. This doesn’t matter since the output I’m looking for is an array of date ranges that contain overlapping activities rather than all the unique combinations of overlaps. Also note the dates are timestamps, so will have millisecond precision and won’t necessarily all be at 00:00:00.
If it helps, there would probably be a WHERE condition on the total count. For example only want to see results where total count > 100