这是一些虚拟数据:
user_id date category
27 2016-01-01 apple
27 2016-01-03 apple
27 2016-01-05 pear
27 2016-01-07 plum
27 2016-01-10 apple
27 2016-01-14 pear
27 2016-01-16 plum
11 2016-01-01 apple
11 2016-01-03 pear
11 2016-01-05 pear
11 2016-01-07 pear
11 2016-01-10 apple
11 2016-01-14 apple
11 2016-01-16 apple
我想计算一下对于每个user_id
不同的数量categories
在指定时间段内(例如过去7、14天内),包括当前订单
解决方案如下所示:
user_id date category distinct_7 distinct_14
27 2016-01-01 apple 1 1
27 2016-01-03 apple 1 1
27 2016-01-05 pear 2 2
27 2016-01-07 plum 3 3
27 2016-01-10 apple 3 3
27 2016-01-14 pear 3 3
27 2016-01-16 plum 3 3
11 2016-01-01 apple 1 1
11 2016-01-03 pear 2 2
11 2016-01-05 pear 2 2
11 2016-01-07 pear 2 2
11 2016-01-10 apple 2 2
11 2016-01-14 apple 2 2
11 2016-01-16 apple 1 2
我发过类似的问题here https://stackoverflow.com/questions/41615967/r-calculate-the-number-of-occurrences-of-a-specific-event-in-the-past-and-futur or here https://stackoverflow.com/questions/41020670/r-calculate-cumulative-sums-and-counts-since-the-last-occurrence-of-a-value,但是都没有提到计算指定时间段内的累积唯一值。非常感谢你的帮助!
我建议使用runner https://cran.r-project.org/web/packages/runner/index.html包裹。您可以在运行 Windows 时使用任何 R 函数runner
功能。下面的代码获取指定输出,即过去 7 天 + 当前和过去 14 天 + 当前(当前 8 天和 15 天):
df <- read.table(
text = " user_id date category
27 2016-01-01 apple
27 2016-01-03 apple
27 2016-01-05 pear
27 2016-01-07 plum
27 2016-01-10 apple
27 2016-01-14 pear
27 2016-01-16 plum
11 2016-01-01 apple
11 2016-01-03 pear
11 2016-01-05 pear
11 2016-01-07 pear
11 2016-01-10 apple
11 2016-01-14 apple
11 2016-01-16 apple", header = TRUE, colClasses = c("integer", "Date", "character"))
library(dplyr)
library(runner)
df %>%
group_by(user_id) %>%
mutate(distinct_7 = runner(category, k = 7 + 1, idx = date,
f = function(x) length(unique(x))),
distinct_14 = runner(category, k = 14 + 1, idx = date,
f = function(x) length(unique(x))))
更多信息在package https://gogonzo.github.io/runner/ and function https://gogonzo.github.io/runner/reference/runner.html文档。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)