The wordspace dist.matrix() https://rdrr.io/cran/wordspace/man/dist_matrix.html函数支持二分距离的并行计算。
标杆管理wordspace
反对parallelDist
matrix1 <- abs(matrix(rnorm(1000),100,100000))
matrix2 <- abs(matrix(rnorm(1000),100,100000))
library(rbenchmark)
library(parallelDist)
library(wordspace)
bipartiteDist_parallelDist <- function(matrix1,matrix2){
matrix12 <- rbind(matrix1,matrix2)
d <- parallelDist(matrix12, method = "euclidean")
d <- as.matrix(d)[(1:nrow(matrix1)),((nrow(matrix1)+1):(nrow(matrix1)*2))]
d
}
bipartiteDist_wordspace <- function(matrix1,matrix2){
wordspace.openmp(threads = wordspace.openmp()$max)
dist.matrix(matrix1,matrix2, byrow = TRUE, method = "euclidean", convert = FALSE)
}
benchmark("parallelDist" = {
bd1 <- bipartiteDist_parallelDist(matrix1,matrix2)
},
"wordspace" = {
bd2 <- bipartiteDist_wordspace(matrix1,matrix2)
},
replications = 1,
columns = c("test", "replications", "elapsed",
"relative", "user.self", "sys.self"))
plot(bd1,bd2) # yes, both methods give near-identical results
基准测试结果:
test replications elapsed relative user.self sys.self
1 parallelDist 1 2.120 12.184 126.145 0.523
2 wordspace 1 0.174 1.000 3.749 0.252
我用了80个线程。
进一步提高速度的框架
作者是wordspace
承认强调低内存负载而不是速度,因此额外的速度增益是可能的(source https://r.789695.n4.nabble.com/dist-function-in-R-is-very-slow-td4738317.html).
例如,以下是欧几里得距离的一般框架:
bipartiteDist3 <- function(matrix1,matrix2){
m1tm2 <- tcrossprod(matrix1,matrix2)
sq1 <- rowSums(matrix1^2)
sq2 <- rowSums(matrix2^2)
out0 <- outer(sq1, sq2, "+") - 2 * m1tm2
sqrt(out0)
}
我对针对稀疏矩阵优化的并行解决方案非常感兴趣。据我所知,wordspace
不针对稀疏性进行优化。例如,tcrossprod、rowSums 和外部函数等价物有可并行的稀疏矩阵实现。