作为这个问题的后续here https://stackoverflow.com/questions/45882166/performance-of-updating-multiple-key-value-pairs-in-a-dict(感谢 MSeifert 的帮助)我遇到了必须屏蔽 numpy 数组的问题new_values
带有索引数组new_vals_idx
在传递掩码数组进行更新之前val_dict
.
对于旧帖子中 MSeifert 的回答中提出的解决方案,我尝试应用数组屏蔽,但性能并不令人满意。
我用于以下示例的数组和字典是:
import numpy as np
val_dict = {'a': 5.0, 'b': 18.8, 'c': -55/2}
for i in range(200):
val_dict[str(i)] = i
val_dict[i] = i**2
keys = ('b', 123, '89', 'c') # dict keys to update
new_values = np.arange(1, 51, 1) / 1.0 # array with new values which has to be masked
new_vals_idx = np.array((0, 3, 5, -1)) # masking array
valarr = np.zeros((new_vals_idx.shape[0])) # preallocation for masked array
length = new_vals_idx.shape[0]
为了使我的代码片段更容易与我的旧问题进行比较,我将坚持 MSeifert 答案的函数命名。这些是我尝试从 python/cython 中获得最佳性能的尝试(由于性能太差,其他答案被省略):
def old_for(val_dict, keys, new_values, new_vals_idx, length):
for i in range(length):
val_dict[keys[i]] = new_values[new_vals_idx[i]]
%timeit old_for(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.6 µs per loop
def old_for_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length):
valarr = new_values[new_vals_idx]
for i in range(length):
val_dict[keys[i]] = valarr[i]
%timeit old_for_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length)
# 100000 loops, best of 3: 2.33 µs per loop
def new2_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length):
valarr = new_values[new_vals_idx].tolist()
for key, val in zip(keys, valarr):
val_dict[key] = val
%timeit new2_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length)
# 100000 loops, best of 3: 2.01 µs per loop
Cython 功能:
%load_ext cython
%%cython
import numpy as np
cimport numpy as np
cpdef new3_cy(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
cdef Py_ssize_t i
cdef double val # this gives about 10 µs speed boost compared to directly assigning it to val_dict
for i in range(length):
val = new_values[new_vals_idx[i]]
val_dict[keys[i]] = val
%timeit new3_cy(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.38 µs per loop
cpdef new3_cy_mview(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
cdef Py_ssize_t i
cdef int[:] mview_idx = new_vals_idx
cdef double [:] mview_vals = new_values
for i in range(length):
val_dict[keys[i]] = mview_vals[mview_idx[i]]
%timeit new3_cy_mview(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.38 µs per loop
# NOT WORKING:
cpdef new2_cy_mview(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
cdef double [new_vals_idx] masked_vals = new_values
for key, val in zip(keys, masked_vals.tolist()):
val_dict[key] = val
cpdef new2_cy_mask(dict val_dict, tuple keys, double[:] new_values, valarr, int[:] new_vals_idx, Py_ssize_t length):
valarr = new_values[new_vals_idx]
for key, val in zip(keys, valarr.tolist()):
val_dict[key] = val
Cython 函数new3_cy
and new3_cy_mview
似乎并没有比old_for
。通过valarr
避免函数内部的数组构造(因为它将被调用数百万次)甚至似乎会减慢速度。
掩蔽new2_cy_mask
与new_vals_idx
Cython 中的 array 给我错误:“指定的内存视图索引无效,类型 int[:]”。有没有类似的类型Py_ssize_t
对于索引数组?
尝试在中创建一个屏蔽内存视图new2_cy_mview
给我错误'无法将类型'double [:]'分配给'double [__pyx_v_new_vals_idx]''。是否有类似屏蔽内存视图之类的东西?我无法找到有关此主题的信息...
将计时结果与我的旧问题的计时结果进行比较,我猜数组屏蔽是占用大部分时间的过程。由于它很可能已经在 numpy 中进行了高度优化,因此可能不需要做太多事情。但速度下降如此之大,以至于必须(希望)有更好的方法来做到这一点。
任何帮助表示赞赏!提前致谢!