另一个重塑问题data.table
set.seed(1234)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))
# x y v
# 1: 1 A 12
# 2: 1 B 62
...
#11: 3 A 63
#12: 3 B 49
我想做一个累计总和x
and v
by y
但结果呈现为:
行数始终保持不变,并且当y==A
the SUM.*.A
递增,当y==B
。 (照常y
可能有很多因素,本例中为 2)
# SUM.x.A SUM.x.B SUM.v.A SUM.v.B
# 1: 1 NA 12 NA
# 2: 1 1 12 62
...
#11: 12 9 318 289
#12: 12 12 318 338
编辑:这是我糟糕的解决方案,显然过于复杂
#first step is to create cumsum columns
colNames <- c("x","v"); newColNames <- paste0("SUM.",colNames)
DT[, newColNames:=lapply(.SD,cumsum) ,by=y, .SDcols=colNames, with=F];
#now we need to reshape each SUM.* to SUM.*.{yvalue}
DT[,N:=.I]; setattr(DT,"sorted","N")
g <- function(DT,SD){
cols <- c('N',grep('SUM',colnames(SD), value=T));
Yval <- unique(SD[,y]);
merge(DT, SD[,cols, with=F], suffixe=c('',paste0('.',Yval)), all.x=T);
}
DT <- Reduce(f=g,init=DT,x=split(DT,DT$y));
locf = function(x) {
ind = which(!is.na(x))
if(is.na(x[1])) ind = c(1,ind)
rep(x[ind], times = diff( c(ind, length(x) + 1) ))
}
newColNames <- grep('SUM',colnames(DT),value=T);
DT <- DT[, (newColNames):=lapply(.SD, locf), .SDcols=newColNames]