我有两级分层数据,我正在尝试在最高级别上执行非参数引导采样,即,通过替换对最高级别的集群进行随机采样,同时保留原始的集群内数据。
我想使用 {boot} 包中的 boot() 函数来实现此目的,因为我想使用需要引导对象的 boot.ci() 来构建 BCa 置信区间。
以下是我不幸的尝试 - 在启动调用上运行调试表明随机采样没有在集群级别发生(=主题)。
### create a very simple two-level dataset with 'subject' as clustering variable
rho <- 0.4
dat <- expand.grid(
trial=factor(1:5),
subject=factor(1:3)
)
sig <- rho * tcrossprod(model.matrix(~ 0 + subject, dat))
diag(sig) <- 1
set.seed(17); dat$value <- chol(sig) %*% rnorm(15, 0, 1)
### my statistic function (adapted from here: http://biostat.mc.vanderbilt.edu/wiki/Main/HowToBootstrapCorrelatedData)
resamp.mean <- function(data, i){
cluster <- c('subject', 'trial')
# sample the clustering factor
cls <- unique(data[[cluster[1]]])[i]
# subset on the sampled clustering factors
sub <- lapply(cls, function(b) subset(data, data[[cluster[1]]]==b))
sub.2 <- do.call(rbind, sub) # join and return samples
mean((sub.2$value)) # calculate the statistic
}
debugonce(boot)
set.seed(17); dat.boot <- boot(data = dat, statistic = resamp.mean, 4)
### stepping trough the debugger until object 'i' was assigned
### investigating 'i'
# Browse[2]> head(i)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 3 7 12 13 10 14 14 15 12 12 12 4 5 9 10
[2,] 15 9 3 13 4 10 2 4 6 11 10 4 9 4 3
[3,] 8 4 7 15 10 12 9 8 9 12 4 15 14 10 4
[4,] 12 3 1 15 8 13 9 1 4 13 9 13 2 11 2
### which is not what I was hoping for.
### I would like something that looks like this, supposing indices = c(2, 2, 1) for the first resample:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 6 7 8 9 10 6 7 8 9 10 1 2 3 4 5
任何帮助将非常感激。