我有一个 Snakemake 配方,其中包含一个非常昂贵的准备步骤,对于所有调用来说都很常见。这是用于演示的伪规则:
rule sample:
input:
"{name}.config"
output:
"{name}.npz"
run:
import somemodule
data = somemodule.Loader("some_big_data") # expensive
np.savez(output, data.process(input)) # also expensive
眼下data
已加载de novo对于每个目标,这都不是最理想的。我怎样才能让它只加载一次?
我寻找一些允许重写规则的东西:
rule sample:
input:
"{name}.config"
output:
"{name}.npz"
setup:
import somemodule
data = somemodule.Loader("some_big_data") # expensive
run:
np.savez(output, data.process(input)) # also expensive
or:
rule sample:
input:
"{name}.config"
output:
"{name}.npz"
run:
import somemodule
data = somemodule.Loader("some_big_data") # expensive
for job in jobs:
np.savez(job.output,
data.process(job.input)) # also expensive
在另一个问题中我已经描述了代码Loader.__init__()是基于 https://stackoverflow.com/questions/68694729/how-can-i-load-fenics-objects-faster.
一种可能的解决方案是使用感兴趣的数据创建一个腌制对象。请研究安全考虑 https://docs.python.org/3/library/pickle.html使用腌制对象来检查它是否适合您的情况。如果是的话,那么它会沿着以下路线:
rule sample:
input:
"{name}.config"
output:
pickle = "{name}.pickle",
run:
import somemodule
import pickle
data = somemodule.Loader("some_big_data") # expensive
pickle.dump(pickle, output.pickle)
在下游规则中,您将像任何其他文件一样引用腌制文件,只需确保使用pickle.load
.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)