通配符 Snakemake 规则的预处理

2024-01-26

我有一个 Snakemake 配方,其中包含一个非常昂贵的准备步骤,对于所有调用来说都很常见。这是用于演示的伪规则:

rule sample:
    input:
        "{name}.config"
    output:
        "{name}.npz"
    run:
        import somemodule
        
        data = somemodule.Loader("some_big_data")  # expensive
        np.savez(output, data.process(input))  # also expensive

眼下data已加载de novo对于每个目标,这都不是最理想的。我怎样才能让它只加载一次?

我寻找一些允许重写规则的东西:

rule sample:
    input:
        "{name}.config"
    output:
        "{name}.npz"
    setup:
        import somemodule
        
        data = somemodule.Loader("some_big_data")  # expensive
    run:
        np.savez(output, data.process(input))  # also expensive

or:

rule sample:
    input:
        "{name}.config"
    output:
        "{name}.npz"
    run:
        import somemodule

        data = somemodule.Loader("some_big_data")  # expensive
        
        for job in jobs:
            np.savez(job.output,
                     data.process(job.input))  # also expensive

在另一个问题中我已经描述了代码Loader.__init__()是基于 https://stackoverflow.com/questions/68694729/how-can-i-load-fenics-objects-faster.


一种可能的解决方案是使用感兴趣的数据创建一个腌制对象。请研究安全考虑 https://docs.python.org/3/library/pickle.html使用腌制对象来检查它是否适合您的情况。如果是的话,那么它会沿着以下路线:

rule sample:
    input:
        "{name}.config"
    output:
        pickle = "{name}.pickle",
    run:
        import somemodule
        import pickle
        
        data = somemodule.Loader("some_big_data")  # expensive
        pickle.dump(pickle, output.pickle)

在下游规则中,您将像任何其他文件一样引用腌制文件,只需确保使用pickle.load.

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

通配符 Snakemake 规则的预处理 的相关文章

随机推荐