如果您必须调用 Python 函数来生成每个要相乘的矩阵,那么您的性能基本上就被搞砸了。但如果你可以向量化transform_step_to_4by4
函数,并让它返回一个具有形状的数组(n, 4, 4)
那么你可以使用节省一些时间matrix_multiply
:
import numpy as np
from numpy.core.umath_tests import matrix_multiply
matrices = np.random.rand(64, 4, 4) - 0.5
def mat_loop_reduce(m):
ret = m[0]
for x in m[1:]:
ret = np.dot(ret, x)
return ret
def mat_reduce(m):
while len(m) % 2 == 0:
m = matrix_multiply(m[::2], m[1::2])
return mat_loop_reduce(m)
In [2]: %timeit mat_reduce(matrices)
1000 loops, best of 3: 287 us per loop
In [3]: %timeit mat_loop_reduce(matrices)
1000 loops, best of 3: 721 us per loop
In [4]: np.allclose(mat_loop_reduce(matrices), mat_reduce(matrices))
Out[4]: True
现在,您有 log(n) 个 Python 调用而不是 n,这有利于 2.5 倍的加速,对于 n = 1024,这将接近 10 倍。显然matrix_multiply
是一个 ufunc,因此有一个.reduce
方法,这将允许您的代码在 Python 中不运行循环。但我无法让它运行,不断收到一个神秘的错误:
In [7]: matrix_multiply.reduce(matrices)
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
RuntimeError: Reduction not defined on ufunc with signature