使用当前开发版本中新实现的非等值连接,可以通过以下简单的方式完成此操作:
require(data.table) # v1.9.7+
DT[, row := .I] # add row numbers
DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first"]
# [1] 5 1 3 2 1 NA 3 1 1 NA
行号是必要的,因为我们需要找到低于当前索引的索引,因此需要成为连接中的一个条件。我们执行一个自加入,即对于每一行DT
(内部),根据提供给的条件on
参数,我们找到第一个匹配的行索引DT
(外)。然后我们减去行索引以获得当前行的位置。x.row
指的是外层的索引DT
and i.row
到内在DT
.
要获取开发版本,请参阅安装说明here https://github.com/Rdatatable/data.table/wiki/Installation.
在 1e5 行上:
set.seed(123)
DT <- data.table(Temp = runif(1e5L, 0L, 20L))
DT[, row := .I]
system.time({
ans = DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first", verbose=TRUE]
})
# Non-equi join operators detected ...
# forder took ... 0.001 secs
# Generating non-equi group ids ... done in 0.452 secs
# Recomputing forder with non-equi ids ... done in 0.001 secs
# Found 623 non-equi group(s) ...
# Starting bmerge ...done in 8.118 secs
# Detected that j uses these columns: x.row,i.row
# user system elapsed
# 8.492 0.038 8.577
head(ans)
# [1] 5 1 3 2 1 12
tail(ans)
# [1] 2 1 1 2 1 NA