我写这个答案是因为问题和接受的答案(在编辑之前)都展示了 R 中糟糕的编程风格:他们正在 for 循环中生成一个向量. (See Circle 2 of 帕特里克·伯恩斯R地狱.)
通过一个简单的基准测试,效果就会变得很明显。任务是创建一个向量x
其中将包含整数 1 到k
:
k <- 10000L
microbenchmark::microbenchmark(
grow = {
x <- integer(0)
for (i in seq.int(k)) x <- c(x, i)
x
},
subscript = {
x <- integer(k)
for (i in seq.int(k)) x[i] <- i
x
},
colon_operator = {
x <- 1L:k
x
},
times = 10L
)
#Unit: microseconds
# expr min lq mean median uq max neval
# grow 93491.676 96127.568 104219.0140 97123.627 99459.343 165545.063 10
# subscript 9067.607 9215.996 9483.0962 9551.288 9771.795 9938.307 10
# colon_operator 5.664 7.552 7.9675 8.307 8.685 9.063 10
很明显,即使对于长度为 10000 个的小向量,附加元素也会比预先分配所需长度慢一个数量级。此处包含冒号运算符的计时是为了演示内置矢量化函数的优点。
所以有问题的代码和answer需要重新编写使用下标以提高效率。
# initialize the random number generator for reproducible results
set.seed(1234L)
# allocate memory for the vectors beforehand
theta1_10 = numeric(k)
theta1_100 = numeric(k)
theta1_1000 = numeric(k)
theta1_10000 = numeric(k)
# Method1
for(i in seq.int(k)){
N10=runif(10)
N100=runif(100)
N1000=runif(1000)
N10000=runif(10000)
# update by subscripting
theta1_10[i] = (1/10)*4*sum(sqrt(1-N10^2))
theta1_100[i] = (1/100)*4*sum(sqrt(1-N100^2))
theta1_1000[i] = (1/1000)*4*sum(sqrt(1-N1000^2))
theta1_10000[i] = (1/10000)*4*sum(sqrt(1-N10000^2))
}
然而,整个代码可以用更简洁的方式重写:
library(data.table)
set.seed(1234)
k <- 1000L
N <- 10^(1:4)
rbindlist(
lapply(N, function(i) {
theta1 <- replicate(k, 4 / i * sum(sqrt(1 - runif(i)^2)))
data.table(N = i, mean = mean(theta1), sd = sd(theta1))
}))
# N mean sd
#1: 10 3.144974 0.27238683
#2: 100 3.140716 0.09040696
#3: 1000 3.141791 0.02654225
#4: 10000 3.141585 0.00886737