Use ave
and seq_along
:
> x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
> ave(x, x, FUN = seq_along)
[1] 1 1 1 1 1 2 2 3 2
另一个需要考虑的选择是data.table
。尽管这需要更多的工作,但它可能会在很长的向量上得到回报。
这是你的例子——绝对看起来有点矫枉过正!
library(data.table)
x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
DT <- data.table(id = sequence(length(x)), x, key = "id")
DT[, y := sequence(.N), by = x][, y]
# [1] 1 1 1 1 1 2 2 3 2
但是对于 10,000,000 个项目长的向量怎么样?
set.seed(1)
x2 <- sample(100, 1e7, replace = TRUE)
funAve <- function() {
ave(x2, x2, FUN = seq_along)
}
funDT <- function() {
DT <- data.table(id = sequence(length(x2)), x2, key = "id")
DT[, y := sequence(.N), by = x2][, y]
}
identical(funAve(), funDT())
# [1] TRUE
library(microbenchmark)
# Unit: seconds
# expr min lq median uq max neval
# funAve() 6.727557 6.792743 6.827117 6.992609 7.352666 20
# funDT() 1.967795 2.029697 2.053886 2.070462 2.123531 20