原始数据如下所示,我想按访问者和时间对其进行排序,以计算行中的时间差,然后将其保存到新文件中。
visitor v_time payment items
1 Jack 1/2/2018 16:07 35 3
2 Jack 1/2/2018 16:09 160 1
3 David 1/2/2018 16:12 25 2
4 Kate 1/2/2018 16:16 3 3
5 David 1/2/2018 16:21 25 5
6 Jack 1/2/2018 16:32 85 5
7 Kate 1/2/2018 16:33 639 3
8 Jack 1/2/2018 16:55 6 2
分组和排序就ok了。但它无法计算时间差,也无法保存文件。
visitor <- c("Jack", "Jack", "David", "Kate", "David", "Jack", "Kate", "Jack")
v_time <- c("1/2/2018 16:07","1/2/2018 16:09","1/2/2018 16:12","1/2/2018 16:16","1/2/2018 16:21","1/2/2018 16:32","1/2/2018 16:33", "1/2/2018 16:55")
payment <- c(35,160,25,3,25,85,639,6)
items <- c(3,1,2,3,5,5,3,2)
df <- data.frame(visitor, v_time, payment, items)
df %>%
arrange(visitor, v_time) %>%
group_by(visitor) %>%
mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M")), diff_secs = as.numeric(diff, units = 'secs'))
write.csv(df,"C:/output.csv", row.names = F)
我的错误是什么以及正确的做法是什么?
# A tibble: 8 x 6
# Groups: visitor [3]
visitor v_time payment items diff diff_secs
<fct> <fct> <dbl> <dbl> <time> <dbl>
1 David 1/2/2018 16:12 25.0 2.00 NA NA
2 David 1/2/2018 16:21 25.0 5.00 NA NA
3 Jack 1/2/2018 16:07 35.0 3.00 NA NA
4 Jack 1/2/2018 16:09 160 1.00 NA NA
5 Jack 1/2/2018 16:32 85.0 5.00 NA NA
6 Jack 1/2/2018 16:55 6.00 2.00 NA NA
7 Kate 1/2/2018 16:16 3.00 3.00 NA NA
8 Kate 1/2/2018 16:33 639 3.00 NA NA