【数据库】窗口函数实战(三)

2023-11-06

窗口函数实战(三)

本篇文章是笔者在牛客网上摘选的几道比较有挑战性的SQL窗口函数编程题。

1.近三个月未完成试卷数为0的用户完成情况【中等】(原题链接

首先来分析一下题目的查询要求。

  • 用户近三个有试卷作答记录的月份(可以用窗口函数解决)。
  • 并且这三个月中,不能有试卷是未完成状态
  • 统计这三个月的试卷完成数

下面先给出一种解法。

with a as (
    select uid, exam_id, start_time, score,
           dense_rank() over (partition by uid order by date_format(start_time,'%Y-%m') desc) as start_time_rank
    from exam_record
)
select uid, count(score) as exam_complete_cnt
from a
where uid not in (
    select uid
    from (
        select *
        from a
        where start_time_rank <= 3
    ) b
    where score is null
) and start_time_rank <= 3
group by uid
order by exam_complete_cnt desc, uid desc

可以看到,上面这种解法在寻找满足条件的 uid 比较麻烦。其实,一个 having count(score) = count(uid) 就可以解决。

select uid, count(score) as exam_complete_cnt
from (
    select uid, start_time, score,
           dense_rank() over(partition by uid order by date_format(start_time, '%Y%m') desc) as start_time_rank
    from exam_record
) a
where start_time_rank <= 3
group by uid
having count(score) = count(uid)
order by exam_complete_cnt desc, uid desc

2.连续两次作答试卷的最大时间窗【较难】(原题链接

我们来分析一下题目的要求。

  • 2021年,至少有 2 天作答过试卷的用户。
  • 该年,连续两次作答试卷的最大时间窗 days_window
  • 根据该年的历史规律,他在 days_window 里平均会做多少套试卷。

用户 1006 分别在 20210901、20210906、20210907 作答过 3 次试卷,连续两次作答最大时间窗为 6 天(1 号到 6 号),他 1 号到 7 号这 7 天里共做了 3 张试卷,平均每天 3/7=0.428571 张,那么 6 天里平均会做 0.428571*6=2.57 张试卷(保留两位小数);

下面给出一种解法。

with a as (
    select uid, start_time,
           DATEDIFF( LAG(start_time, 1) over ( partition by uid order by start_time DESC ), start_time ) + 1 as days_win
    from exam_record
    where year(start_time) = 2021
)
select uid, days_window, round(days_window * avg_exam, 2) as avg_exam_cnt
from (
    select uid, max(days_win) as days_window,
           count(start_time) / (DATEDIFF(max(start_time),min(start_time)) + 1) as avg_exam
    from a
    group by uid
    having days_window >= 2
) b
order by days_window desc, avg_exam_cnt desc
  • 首先利用 LAG 函数得到上一次答题的时间;
  • 再利用 DATEDIFF 得到两次答题之间的窗口大小(注意 +1);
  • 剩下的计算就比较简单了。

3.第二快/慢用时之差大于试卷时长一半的试卷【较难】(原题链接

题目的要求很简单:找到 第二快和第二慢用时之差 大于 试卷时长的一半 的试卷信息,按试卷ID降序排序。

以下是我给出的一种解法。

SELECT distinct exam_id, duration, release_time
from (
    select exam_id, duration, release_time, 
           nth_value(duration_time, 2) over (partition by exam_id order by duration_time desc) as max2_duration_time,
           nth_value(duration_time, 2) over (partition by exam_id order by duration_time) as min2_duration_time
    from (
        select exam_record.exam_id, duration, release_time, 
               (unix_timestamp(submit_time) - unix_timestamp(start_time)) / 60 as duration_time
        from exam_record
        left join examination_info using(exam_id)
        where submit_time is not null
    ) a
) b
where max2_duration_time is not null and min2_duration_time is not NULL
      and (max2_duration_time - min2_duration_time) > 0.5 * duration
order by exam_id desc

另外给出一种解法。

SELECT exam_id, duration, release_time
from (
    select exam_id, duration, release_time,
           sum( case when duration_time_asc = 2 then duration_time 
                     when duration_time_desc = 2 then -duration_time 
                     else 0 end ) as sub
    from (
        select exam_record.exam_id, duration, release_time, 
               timestampdiff(second, start_time, submit_time) as duration_time,
               row_number() over (partition by exam_id order by timestampdiff(second, start_time, submit_time) desc) as duration_time_asc,
               row_number() over (partition by exam_id order by timestampdiff(second, start_time, submit_time)) as duration_time_desc
        from exam_record
        left join examination_info using(exam_id)
        where submit_time is not null
    ) a
    group by exam_id
) b
where sub > duration * 60 * 0.5
order by exam_id desc
  • 对比第一种方法用的 nth_value,上面这种做法使用的是 row_number
  • 在计算 第二快和第二慢用时之差 的处理也不一样。

4.每类试卷得分前3名【中等】(原题链接

此题难度不大,要求也不复杂:找到每类试卷得分的前 3 名,如果两人最大分数相同,选择最小分数大者,如果还相同,选择 uid 大者。

有作答得分记录的试卷 tag 有 SQL 和算法,SQL 试卷用户 1001、1002、1003、1004 有作答得分,最高得分分别为 81、81、89、85,最低得分分别为 78、81、86、40,因此先按最高得分排名再按最低得分排名取前三为 1003、1004、1002。

select * 
from (
    select tag as tid, uid,
           row_number() over (partition by tag order by tag, max(score) desc, min(score) desc, uid desc) as ranking
    from exam_record
    left join examination_info on exam_record.exam_id = examination_info.exam_id
    group by examination_info.tag, exam_record.uid
) a
where ranking <=3
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

【数据库】窗口函数实战(三) 的相关文章

随机推荐