我的数据集中有 3 列:
- Monetary
- Recency
- 频率
我想使用 SQL 创建另外 3 个列,例如 M_P、R_Q、F_Q,其中包含货币、新近度和频率每个值的百分位值。
先感谢您。
Customer_ID Frequency Recency Monetary R_Q F_Q M_Q
112 1 39 7.05 0.398 0.789 0.85873
143 1 23 0.1833 0.232 0.7895 0.1501
164 1 52 0.416 0.508 0.789 0.295
123 1 118 1.1 0.98 0.789 0.52
您要找的函数是ANSI标准函数ntile()
:
select t.*,
ntile(100) over (order by monetary) as percentile_monetary,
ntile(100) over (order by recency) as percentile_recency,
ntile(100) over (order by frequency) as percentile_frequency
from t;
这在大多数数据库中都是可用的。
您可以使用以下方法计算百分位数rank()
and count()
。根据您想要处理关系的方式以及您想要 1-100 还是 0-100 之间的值,以下应该是一个很好的起点:
select t.*,
(1 + rank_monetary * 100.0 / cnt) as percentile_monetary,
(1 + rank_recency * 100.0 / cnt) as percentile_recency,
(1 + rank_frequency * 100.0 / cnt) as percentile_frequency
from (select t.*,
count(*) over () as cnt,
rank() over (order by monetary) - 1 as rank_monetary,
rank() over (order by recency) - 1 as rank_recency,
rank() over (order by frequency) - 1 as rank_frequency
from t
) t;
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)