仅返回活动分钟数
Shortest
SELECT DISTINCT
date_trunc('minute', "when") AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY 1;
Use date_trunc() https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC,它返回的正是您所需要的。
不包括id
在查询中,因为您想要GROUP BY
分钟切片。
count()
通常用作普通聚合函数 https://www.postgresql.org/docs/current/functions-aggregate.html。附加一个OVER
条款使其成为窗函数 https://www.postgresql.org/docs/current/functions-window.html. Omit PARTITION BY
在窗口定义中 - 你想要一个运行计数在所有行上。默认情况下,从当前行的第一行到最后一个对等行进行计数,定义如下ORDER BY
. 手册 https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS:
默认的框架选项是RANGE UNBOUNDED PRECEDING
, 哪一个是
与...一样RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. With ORDER BY
,
这将框架设置为从分区开始的所有行
到当前行的最后一个ORDER BY
peer.
而这恰好是exactly你需要什么。
Use count(*)
而不是count(id)
。它更适合您的问题(“行数”)。一般是轻微的faster than count(id)
。而且,虽然我们可能假设id
is NOT NULL
,问题中没有具体说明,所以count(id)
is wrong,严格来说,因为 NULL 值不计算在内count(id)
.
你不能GROUP BY
相同查询级别的分钟切片。应用聚合函数before窗函数, 窗函数count(*)
这样每分钟只能看到 1 行。
但是,您可以SELECT DISTINCT
, 因为DISTINCT
被申请;被应用after窗口函数。
ORDER BY 1
只是简写ORDER BY date_trunc('minute', "when")
here.
1
是对第一个表达式的位置引用SELECT
list.
Use to_char() https://www.postgresql.org/docs/current/functions-formatting.html如果您需要格式化结果。喜欢:
SELECT DISTINCT
to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY date_trunc('minute', "when");
Fastest
SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) sub
ORDER BY 1;
很像上面的,但是:
我使用子查询来聚合和计算每分钟的行数。这样我们每分钟就可以得到 1 行,而无需DISTINCT
在外SELECT
.
Use sum()
现在作为窗口聚合函数来添加子查询的计数。
我发现这要快得多,每分钟有很多行。
包括不活动的分钟数
Shortest
@GabiMe 在评论中提问 https://stackoverflow.com/questions/8193688/postgresql-running-count-of-rows-for-a-query-by-minute/8194088#comment10143564_8194088如何获得每一行every minute
在时间范围内,包括那些没有发生事件的时间范围(基表中没有行):
SELECT DISTINCT
minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER BY 1;
在第一个事件和最后一个事件之间的时间范围内的每一分钟生成一行generate_series() https://www.postgresql.org/docs/current/functions-srf.html- 这里直接基于子查询的聚合值。
LEFT JOIN
所有时间戳被截断为分钟并计数。NULL
值(不存在行的情况)不会添加到运行计数中。
Fastest
热膨胀系数 (CTE):
WITH cte AS (
SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
FROM tbl
GROUP BY 1
)
SELECT m.minute
, COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(min(minute), max(minute), interval '1 min')
FROM cte
) m(minute)
LEFT JOIN cte USING (minute)
ORDER BY 1;
同样,在第一步中聚合并计算每分钟的行数,它省略了后面的需要DISTINCT
.
不同于count()
, sum()
可以返回NULL
。默认为0
with COALESCE
.
有很多行和一个索引于"when"
在我使用 Postgres 9.1 - 9.4 测试的几个变体中,这个带有子查询的版本是最快的:
SELECT m.minute
, COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) c USING (minute)
ORDER BY 1;