我在 athena 中有一个数据集,因此出于此目的,您可以将其视为 postgres 数据库。数据样本可以在这里看到sql小提琴 http://www.sqlfiddle.com/#!17/4b3732/2.
这是一个示例:
create table vals (
timestamp int,
type varchar(25),
val int
);
insert into vals(timestamp,type, val)
values (10, null, 1),
(20, null, 2),
(39, null, 1),
(40,'p',1),
(50,'p',2),
(60,'p',1),
(70,'v',5),
(80,'v',6),
(90,'v',6),
(100,'v',3),
(110,null,3),
(120,'v',6),
(130,null,3),
(140,'p',10),
(150,'p',8),
(160,null,3),
(170,'p',1),
(180,'p',2),
(190,'p',2),
(200,'p',1),
(210,null,3),
(220,'v',1),
(230,'v',1),
(240,'v',3),
(250,'v',41)
我想要得到的是一个包含所有值但突出显示“p”的最高值和连续“v”的最低值的数据集。
所以最终我会得到:
timestamp, type, value, is_peak
(10, null, 1, null),
(20, null, 2, null),
(39, null, 1, null),
(40,'p',1, null),
(50,'p',2, 1),
(60,'p',1, null),
(70,'v',5, null),
(80,'v',6, null),
(90,'v',6, null),
(100,'v',3, 1),
(110,null,3, null),
(120,'v',6, 1),
(130,null,3, null),
(140,'p',10, 1),
(150,'p',8, null),
(160,null,3, null),
(170,'p',1, null),
(180,'p',2, 1),
(190,'p',2, null), -- either this record or 180 would be fine
(200,'p',1, null),
(210,null,3, null),
(220,'v',1, 1), -- again either this or 230
(230,'v',1, null),
(240,'v',3, null),
(250,'v',41, null)
is Peak 有很多类型选择,如果它是某种密集等级或递增数字那就可以了。这样我就可以确信在连续的集合中,“标记”的值是最高或最低值。
祝你好运,感谢帮助
注意:峰值的最大值或谷值的最小值可以位于连续集中的任意位置,但是一旦类型发生变化,我们就会重新开始。