我得到了一些我无法解释的效率测试结果。
我想组装一个矩阵 B,其第 i 个条目 B[i,:,:] = A[i,:,:].dot(x),其中每个 A[i,:,:] 是一个 2D 矩阵, x 也是如此。
我可以通过三种方式做到这一点,为了测试我随机制作的性能(numpy.random.randn
) 矩阵 A = (10,1000,1000), x = (1000,1200)。我得到以下时间结果:
(1)单个多维点积
B = A.dot(x)
total time: 102.361 s
(2) 循环i并执行2D点积
# initialize B = np.zeros([dim1, dim2, dim3])
for i in range(A.shape[0]):
B[i,:,:] = A[i,:,:].dot(x)
total time: 0.826 s
(3) numpy.einsum
B3 = np.einsum("ijk, kl -> ijl", A, x)
total time: 8.289 s
因此,选项(2)是迄今为止最快的。但是,仅考虑(1)和(2),我看不出它们之间有很大的区别。循环执行 2D 点积如何能快 124 倍?他们都使用 numpy.dot。有什么见解吗?
我在下面包含了用于上述结果的代码:
import numpy as np
import numpy.random as npr
import time
dim1, dim2, dim3 = 10, 1000, 1200
A = npr.randn(dim1, dim2, dim2)
x = npr.randn(dim2, dim3)
# consider three ways of assembling the same matrix B: B1, B2, B3
t = time.time()
B1 = np.dot(A,x)
td1 = time.time() - t
print "a single dot product of A [shape = (%d, %d, %d)] with x [shape = (%d, %d)] completes in %.3f s" \
% (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td1)
B2 = np.zeros([A.shape[0], x.shape[0], x.shape[1]])
t = time.time()
for i in range(A.shape[0]):
B2[i,:,:] = np.dot(A[i,:,:], x)
td2 = time.time() - t
print "taking %d dot products of 2D dot products A[i,:,:] [shape = (%d, %d)] with x [shape = (%d, %d)] completes in %.3f s" \
% (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td2)
t = time.time()
B3 = np.einsum("ijk, kl -> ijl", A, x)
td3 = time.time() - t
print "using np.einsum, it completes in %.3f s" % td3