如何找到最长的连续日期序列?

2023-12-22

我有一个数据库,其时间戳访问时间如下

ID, time
1, 1493596800
1, 1493596900
1, 1493432800
2, 1493596800
2, 1493596850
2, 1493432800

我使用 Spark SQL,我需要为每个 ID 提供最长的连续日期序列,例如

ID, longest_seq (days)
1, 2
2, 5
3, 1

我尝试调整这个答案使用 SQL 检测连续日期范围 https://stackoverflow.com/questions/20402089/detect-consecutive-dates-ranges-using-sql就我而言,但我没有达到我的预期。

 SELECT ID, MIN (d), MAX(d)
    FROM (
      SELECT ID, cast(from_utc_timestamp(cast(time as timestamp), 'CEST') as date) AS d, 
                ROW_NUMBER() OVER(
         PARTITION BY ID ORDER BY cast(from_utc_timestamp(cast(time as timestamp), 'CEST') 
                                                           as date)) rn
      FROM purchase
      where ID is not null
      GROUP BY ID, cast(from_utc_timestamp(cast(time as timestamp), 'CEST') as date) 
    )
    GROUP BY ID, rn
    ORDER BY ID

如果有人知道如何解决此请求,或者其中有什么问题,我将不胜感激 谢谢

[编辑] 更明确的输入/输出

ID, time
1, 1
1, 2
1, 3
2, 1
2, 3
2, 4
2, 5
2, 10
2, 11
3, 1
3, 4
3, 9
3, 11

结果是:

ID, MaxSeq (in days)
1,3
2,3
3,1

所有访问都在时间戳中,但我需要连续的天,然后每天的每次访问都按天计算一次


我的以下回答改编自https://dzone.com/articles/how-to-find-the-longest-consecutive-series-of-even https://dzone.com/articles/how-to-find-the-longest-consecutive-series-of-even用于 Spark SQL 中。您将使用以下内容包装 SQL 查询:

spark.sql("""
SQL_QUERY
""")

因此,对于第一个查询:

CREATE TABLE intermediate_1 AS
SELECT 
    id,
    time,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY time) AS rn,
    time - ROW_NUMBER() OVER (PARTITION BY id ORDER BY time) AS grp
FROM purchase

这会给你:

id, time, rn, grp
1,  1,    1,  0
1,  2,    2,  0
1,  3,    3,  0
2,  1,    1,  0
2,  3,    2,  1
2,  4,    3,  1
2,  5,    4,  1
2,  10,   5,  5
2,  11,   6,  5
3,  1,    1,  0
3,  4,    2,  2
3,  9,    3,  6
3,  11,   4,  7

我们可以看到连续的行具有相同的 grp 值。然后我们将使用 GROUP BY 和 COUNT 来获取连续时间的数量。

CREATE TABLE intermediate_2 AS
SELECT 
    id,
    grp,
    COUNT(*) AS num_consecutive
FROM intermediate_1
GROUP BY id, grp

这将返回:

id, grp, num_consecutive
1,  0,   3
2,  0,   1
2,  1,   3
2,  5,   2
3,  0,   1
3,  2,   1
3,  6,   1
3,  7,   1

现在我们只需使用 MAX 和 GROUP BY 即可获得最大连续次数。

CREATE TABLE final AS
SELECT 
    id,
    MAX(num_consecutive) as max_consecutive
FROM intermediate_2
GROUP BY id

这会给你:

id, max_consecutive
1,  3
2,  3
3,  1

希望这可以帮助!

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何找到最长的连续日期序列? 的相关文章

随机推荐