这是我对以下问题的解决方案的后续问题:
如何将函数并行应用于 numpy 数组中的多个图像? https://stackoverflow.com/a/48374491/2956066
我建议的解决方案 https://stackoverflow.com/questions/48373944/how-to-apply-a-function-in-parallel-to-multiple-images-in-a-numpy-array/48374491#48374491如果函数工作正常process_image()
必须返回结果,然后我们可以将其缓存到某个列表中以供以后处理。
因为我想对超过 100K 图像(具有数组形状(100000, 32, 32, 3)
),我希望我的解决方案非常高效。但是,我的基于列表的方法将占用大量内存,因此它的效率也很低(对于进一步处理)。所以,我想更新数组in-place在 - 的里面process_image()
函数,当使用此函数多次调用时joblib https://pythonhosted.org/joblib/.
但是,我在更新原始批处理图像数组时遇到问题in-place。我尝试过suggestion by Eric https://stackoverflow.com/questions/48373944/how-to-apply-a-function-in-parallel-to-multiple-images-in-a-numpy-array/48374491#comment83738800_48374491但它无法更新原始数组in-place。我通过打印以下内容来验证数组内存是否确实在工作进程之间共享flags数组内的process_image
功能。这是我这样做的代码:
import numpy as np
from skimage import exposure
from joblib import Parallel, delayed
# number of processes
nprocs = 10
# batched image array
img_arr = np.random.randint(0, 255, (1000, 32, 32, 3)).astype(np.float32)
# for verification
img_arr_copy = img_arr.copy()
# function to be applied on all images (in parallel)
# note: this function fails to update the original array in-place
# but, I want in-place updation of original array with the result of `equalize_hist`
def process_image(img, idx):
"""
update original array in-place since all worker processes share
original memory! i.e. they don't make copy while processing it.
"""
print("\n processing image: ", idx)
img[...] = exposure.equalize_hist(img)
print("array metadata: \n", img.flags)
print("======================= \n")
# run `process_image()` in parallel
Parallel(n_jobs=nprocs)(delayed(process_image)(img_arr[idx], idx) for idx in range(img_arr.shape[0]))
我什至尝试使用初始化一个空数组np.empty()
与原始批处理图像数组形状相同并尝试更新它,但也失败了。我不知道哪里出了问题。
为了检查数组是否发生更新,我使用了:
np.all(result_arr == img_arr)
where result_arr
被初始化为:
result_arr = np.empty(img_arr.shape, dtype=np.float32)
我哪里出错了,我的代码中有什么错误?所有建议都受到高度赞赏!
从上面的代码打印统计信息以检查内存是否共享:
processing image: 914
array metadata:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False #<=========== so memory is shared
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
=======================
processing image: 614
array metadata:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False #<=========== so memory is shared
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
=======================