这是一种 base-R 方法:
do.call("rbind",
by(df, INDICES=df$ID, FUN=function(DF) DF[which.max(DF$week), ]))
ID week outcome
1 1 6 42
4 4 12 85
9 9 12 84
或者,data.table
包提供了一种简洁且富有表现力的语言来执行这种类型的数据帧操作:
library(data.table)
dt <- data.table(df, key="ID")
dt[, .SD[which.max(outcome), ], by=ID]
# ID week outcome
# [1,] 1 6 42
# [2,] 4 12 85
# [3,] 9 12 84
# Same but much faster.
# (Actually, only the same as long as there are no ties for max(outcome)..)
dt[ dt[,outcome==max(outcome),by=ID][[2]] ] # same, but much faster.
# If there are ties for max(outcome), the following will still produce
# the same results as the method using .SD, but will be faster
i1 <- dt[,which.max(outcome), by=ID][[2]]
i2 <- dt[,.N, by=ID][[2]]
dt[i1 + cumsum(i2) - i2,]
最后,这里有一个plyr
基于解决方案
library(plyr)
ddply(df, .(ID), function(X) X[which.max(X$week), ])
# ID week outcome
# 1 1 6 42
# 2 4 12 85
# 3 9 12 84