这是我的数据集的示例:
df=data.frame(id=c("9","9","9","5","5","5","4","4","4","4","4","20","20"),
Date=c("11/29/2018","11/29/2018","11/29/2018","5/25/2018","2/13/2019","2/13/2019","6/7/2018",
"6/15/2018","6/20/2018","8/17/2018","8/20/2018","12/25/2018","12/25/2018"),
Buyer= c("John","John","John","Maria","Maria","Maria","Sandy","Sandy","Sandy","Sandy","Sandy","Paul","Paul"))
我需要计算我已经完成的日期和数据集之间的差异,然后看起来像:
| id | Date | Buyer | diff |
|----|:----------:|------:|------|
| 9 | 11/29/2018 | John | NA |
| 9 | 11/29/2018 | John | 0 |
| 9 | 11/29/2018 | John | 0 |
| 5 | 5/25/2018 | Maria | -188 |
| 5 | 2/13/2019 | Maria | 264 |
| 5 | 2/13/2019 | Maria | 0 |
| 4 | 6/7/2018 | Sandy | -251 |
| 4 | 6/15/2018 | Sandy | 8 |
| 4 | 6/20/2018 | Sandy | 5 |
| 4 | 8/17/2018 | Sandy | 58 |
| 4 | 8/20/2018 | Sandy | 3 |
| 20 | 12/25/2018 | Paul | 127 |
| 20 | 12/25/2018 | Paul | 0 |
现在,如果每组“diff”列中第二行的值大于或等于 5,那么我需要删除每组的第一行。例如,对于 ID 为“5”的买家“Maria”,差异值 264 大于 5,因此我想删除该组中的第一行,即 ID 为“5”的买家“Maria”,日期为'5/25/2018',差异为'-188'
下面是我的代码示例:
df1=df %>% group_by(Buyer,id) %>%
mutate(diff = c(NA, diff(Date))) %>%
filter(!(diff >=5 & row_number() == 1))
问题是上面的代码选择第一行而不是第二行,并且我不知道如何指定每个组的第二行,其中 diff 值应大于或等于 5。
我的预期输出应该如下所示:
| id | Date | Buyer | diff |
|----|:----------:|------:|------|
| 9 | 11/29/2018 | John | NA |
| 9 | 11/29/2018 | John | 0 |
| 9 | 11/29/2018 | John | 0 |
| 5 | 2/13/2019 | Maria | 264 |
| 5 | 2/13/2019 | Maria | 0 |
| 4 | 6/15/2018 | Sandy | 8 |
| 4 | 6/20/2018 | Sandy | 5 |
| 4 | 8/17/2018 | Sandy | 58 |
| 4 | 8/20/2018 | Sandy | 3 |
| 20 | 12/25/2018 | Paul | 127 |
| 20 | 12/25/2018 | Paul | 0 |