我正在使用 SQL Server 2008 R2。我需要计算每个组的百分位数,例如:
SELECT id,
PCTL(0.9, x) -- for the 90th percentile
FROM my_table
GROUP BY id
ORDER BY id
例如,给定这个 DDL (fiddle http://sqlfiddle.com/#!3/a3661/1) ---
CREATE TABLE my_table (id INT, x REAL);
INSERT INTO my_table
VALUES (7, 0.164595), (5, 0.671311), (7, 0.0118385), (6, 0.704592), (3, 0.633521), (3, 0.337268), (0, 0.54739), (6, 0.312282), (0, 0.220618), (7, 0.214973), (6, 0.410768), (7, 0.151572), (7, 0.0639506), (5, 0.339075), (1, 0.284094), (2, 0.126722), (2, 0.870079), (3, 0.369366), (1, 0.6687), (5, 0.199456), (5, 0.0296715), (1, 0.330339), (9, 0.0000459612), (5, 0.391947), (3, 0.753965), (8, 0.334207), (7, 0.583357), (3, 0.326951), (4, 0.207057), (2, 0.258463), (2, 0.0532811), (1, 0.751584), (7, 0.592624), (7, 0.673506), (5, 0.44764), (6, 0.733737), (5, 0.141215), (7, 0.222452), (3, 0.597019), (1, 0.293901), (4, 0.516213), (7, 0.498336), (6, 0.410461), (2, 0.32211), (1, 0.466735), (5, 0.720456), (8, 0.000428383), (3, 0.46085), (0, 0.402963), (7, 0.677002), (0, 0.400122), (1, 0.762357), (9, 0.158455), (7, 0.359723), (4, 0.225914), (7, 0.795345), (6, 0.902261), (2, 0.69533), (8, 0.593605), (6, 0.266233), (0, 0.917188), (9, 0.96353), (2, 0.577035), (8, 0.945236), (3, 0.257776), (4, 0.560569), (0, 0.838326), (2, 0.660338), (2, 0.537372), (8, 0.33806), (0, 0.545107), (1, 0.616673), (5, 0.30411), (0, 0.434737), (2, 0.588249), (9, 0.991362), (8, 0.772253), (6, 0.705396), (5, 0.323255), (8, 0.830319), (3, 0.679546), (4, 0.399748), (4, 0.440115), (6, 0.938154), (8, 0.333143), (9, 0.923541), (7, 0.19552), (4, 0.869822), (7, 0.620006), (4, 0.833529), (4, 0.297515), (4, 0.19906), (5, 0.540905), (9, 0.33313), (5, 0.200515), (5, 0.900481), (6, 0.02665), (3, 0.495421), (0, 0.96582), (9, 0.847218);
--- 我想要大约(在变化范围内)常用百分位数法 http://en.wikipedia.org/wiki/Percentile) 下列:
id x
----------
0 0.9658
1 0.7624
2 0.6953
3 0.6795
4 0.8335
5 0.7205
6 0.9023
7 0.677
8 0.9452
9 0.9914
实际输入集大约有 200 万行,每个实际id
group 有几十到几百(或可能更多)行。
我已经探索了 SO 和其他网站的解决方案,但似乎我检查的几十页左右的解决方案仅适用于计算整个行集而不是行集的每个组/分区的百分位数。 (我对 SQL 相对缺乏经验,所以我可能忽略了一些事情。)
我还查看了文档排名函数 http://msdn.microsoft.com/en-us/library/ms189798(v=sql.105).aspx,但我无法将有效的查询粘合在一起。
我想用PERCENTILE_DISC http://msdn.microsoft.com/en-us/library/hh231327(v=sql.110).aspx or PERCENTILE_CONT http://msdn.microsoft.com/en-us/library/hh231473(v=sql.110).aspx,但我现在还是坚持使用 2008 R2。