假设我有以下表定义:
CREATE TABLE x (i serial primary key, value integer not null);
我想计算的中位数value
(不是AVG)。中位数是将集合分为包含相同数量元素的两个子集的值。如果元素个数为偶数,则中位数为最低段中的最大值与最大段中的最低值的平均值。 (有关更多详细信息,请参阅维基百科。)
这是我计算中位数的方法,但我想一定有更好的方法:
SELECT AVG(values_around_median) AS median
FROM (
SELECT
DISTINCT(CASE WHEN FIRST_VALUE(above) OVER w2 THEN MIN(value) OVER w3 ELSE MAX(value) OVER w2 END)
AS values_around_median
FROM (
SELECT LAST_VALUE(value) OVER w AS value,
SUM(COUNT(*)) OVER w > (SELECT count(*)/2 FROM x) AS above
FROM x
GROUP BY value
WINDOW w AS (ORDER BY value)
ORDER BY value
) AS find_if_values_are_above_or_below_median
WINDOW w2 AS (PARTITION BY above ORDER BY value DESC),
w3 AS (PARTITION BY above ORDER BY value ASC)
) AS find_values_around_median
有任何想法吗?
是的,在 PostgreSQL 9.4 中,您可以使用新引入的逆分布函数PERCENTILE_CONT() http://www.postgresql.org/docs/9.4/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE,也是 SQL 标准中指定的有序集聚合函数。
WITH t(value) AS (
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 100
)
SELECT
percentile_cont(0.5) WITHIN GROUP (ORDER BY value)
FROM
t;
这个仿真MEDIAN() via PERCENTILE_CONT()也记录在这里 http://blog.jooq.org/2015/01/06/how-to-emulate-the-median-aggregate-function-using-inverse-distribution-functions/.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)