保持代码简单而不是优化
如果您知道要编写什么算法,请编写一个简单的参考实现。由此,您可以通过两种方式使用 Python。您可以尝试向量化代码or您可以编译代码以获得良好的性能。
Even if np.einsum
or np.add.at
如果是在 Numba 中实现的,那么任何编译器都很难从您的示例中生成高效的二进制代码。
我唯一重写的是一种更有效的标量值数字化方法。
Edit
在 Numpy 源代码中,有一种更有效的标量值数字化实现。
Code
#From Numba source
#Copyright (c) 2012, Anaconda, Inc.
#All rights reserved.
@nb.njit(fastmath=True)
def digitize(x, bins, right=False):
# bins are monotonically-increasing
n = len(bins)
lo = 0
hi = n
if right:
if np.isnan(x):
# Find the first nan (i.e. the last from the end of bins,
# since there shouldn't be many of them in practice)
for i in range(n, 0, -1):
if not np.isnan(bins[i - 1]):
return i
return 0
while hi > lo:
mid = (lo + hi) >> 1
if bins[mid] < x:
# mid is too low => narrow to upper bins
lo = mid + 1
else:
# mid is too high, or is a NaN => narrow to lower bins
hi = mid
else:
if np.isnan(x):
# NaNs end up in the last bin
return n
while hi > lo:
mid = (lo + hi) >> 1
if bins[mid] <= x:
# mid is too low => narrow to upper bins
lo = mid + 1
else:
# mid is too high, or is a NaN => narrow to lower bins
hi = mid
return lo
@nb.njit(fastmath=True)
def digitize(value, bins):
if value<bins[0]:
return 0
if value>=bins[bins.shape[0]-1]:
return bins.shape[0]
for l in range(1,bins.shape[0]):
if value>=bins[l-1] and value<bins[l]:
return l
@nb.njit(fastmath=True,parallel=True)
def inner_loop(boost_factor,freq_bins,es):
res=np.zeros((boost_factor.shape[0],freq_bins.shape[0]),dtype=np.float64)
for i in nb.prange(boost_factor.shape[0]):
for j in range(boost_factor.shape[1]):
for k in range(freq_bins.shape[0]):
ind=nb.int64(digitize(boost_factor[i,j]*freq_bins[k],freq_bins))
res[i,ind]+=boost_factor[i,j]*es[j,k]*freq_bins[ind]
return res
@nb.njit(fastmath=True)
def calc_nb(division,freq_division,cd,boost_factor,freq_bins,es):
final_emit = np.empty((division, division, freq_division),np.float64)
for i in range(division):
final_emit[i,:,:]=inner_loop(boost_factor[i],freq_bins,es)
return final_emit
表现
(Quadcore i7)
original_code: 118.5s
calc_nb: 4.14s
#with digitize implementation from Numba source
calc_nb: 2.66s