使用输入函数可能是最好的解决方案,如下所示:
- 将通配符传递给输入函数
- 使用已知的 YAML 值,使用该示例名称构建理论文件名。
- 使用 python 函数检查哪个文件(技术上是文件后缀)有效
- 建立有效文件列表
- 返回并解压有效文件列表。
Notes:
- 输入和输出应该具有相同的通配符,如果没有,则会导致问题
- 在输入函数中,确保它不能返回空字符串,因为 Snakemake 将此解释为“缺少输入”要求,这不是您想要的。
- 如果您采纳这些建议,请更新规则名称,我忘了。
蛇文件:
configfile: "config.yaml"
from os.path import join
from os.path import exists
rule all:
input:
expand("{trim_out_path}/{sample}.{readDirection}.fq.gz",
trim_out_path=config["trim_out_path"],
sample=config["sampleList"],
readDirection=['1','2'])
def trim_galore_input_determination(wildcards):
potential_file_path_list = []
# Cycle through both suffix possibilities:
for fastqSuffix in [".fq", ".fq.gz"]:
# Cycle through both read directions
for readDirection in ['.1','.2']:
#Build the list for ech suffix
potential_file_path = config["fq_in_path"] + "/" + wildcards.sample + readDirection + fastqSuffix
#Check if this file actually exists
if exists(potential_file_path):
#If file is legit, add to list of acceptable files
potential_file_path_list.append(potential_file_path)
# Checking for an empty list
if len(potential_file_path_list):
return potential_file_path_list
else:
return ["trim_galore_input_determination_FAILURE" + wildcards.sample]
rule trim_galore_unzipped_PE:
input:
unpack(trim_galore_input_determination)
output:
expand("{trim_out_path}/{{sample}}.{readDirection}.fq.gz",
trim_out_path=config["trim_out_path"],
readDirection=['1','2'])
params:
out_path=config['trim_out_path'],
conda:
'envs/biotools.yaml',
shell:
'trim_galore --gzip -o {params.out_path} --paired {input}'
配置.yaml:
fq_in_path: input/fq
trim_out_path: output
sampleList: ["mySample1", "mySample2"]
$tree:
|-- [tboyarsk 1540 Sep 6 15:17] Snakefile
|-- [tboyarsk 82 Sep 6 15:17] config.yaml
|-- [tboyarsk 512 Sep 6 8:55] input
| |-- [tboyarsk 512 Sep 6 8:33] fq
| | |-- [tboyarsk 0 Sep 6 7:50] mySample1.1.fq
| | |-- [tboyarsk 0 Sep 6 8:24] mySample1.2.fq
| | |-- [tboyarsk 0 Sep 6 7:50] mySample2.1.fq
| | `-- [tboyarsk 0 Sep 6 8:24] mySample2.2.fq
| `-- [tboyarsk 512 Sep 6 8:55] fqgz
| |-- [tboyarsk 0 Sep 6 7:50] mySample1.1.fq.gz
| |-- [tboyarsk 0 Sep 6 8:32] mySample1.2.fq.gz
| |-- [tboyarsk 0 Sep 6 8:33] mySample2.1.fq.gz
| `-- [tboyarsk 0 Sep 6 8:32] mySample2.2.fq.gz
`-- [tboyarsk 512 Sep 6 7:55] output
$snakemake -dry(输入:fg)
rule trim_galore_unzipped_PE:
input: input/fq/mySample1.1.fq, input/fq/mySample1.2.fq
output: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz
jobid: 1
wildcards: sample=mySample1
rule trim_galore_unzipped_PE:
input: input/fq/mySample2.1.fq, input/fq/mySample2.2.fq
output: output/mySample2.1.fq.gz, output/mySample2.2.fq.gz
jobid: 2
wildcards: sample=mySample2
localrule all:
input: output/mySample1.1.fq.gz, output/mySample2.1.fq.gz, output/mySample1.2.fq.gz, output/ mySample2.2.fq.gz
jobid: 0
Job counts:
count jobs
1 all
2 trim_galore_unzipped_PE
3
$snakemake -dry(输入:fgqz)
rule trim_galore_unzipped_PE:
input: input/fqgz/mySample1.1.fq.gz, input/fqgz/mySample1.2.fq.gz
output: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz
jobid: 1
wildcards: sample=mySample1
rule trim_galore_unzipped_PE:
input: input/fqgz/mySample2.1.fq.gz, input/fqgz/mySample2.2.fq.gz
output: output/mySample2.1.fq.gz, output/mySample2.2.fq.gz
jobid: 2
wildcards: sample=mySample2
localrule all:
input: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz, output/mySample2.1.fq.gz, output/ mySample2.2.fq.gz
jobid: 0
Job counts:
count jobs
1 all
2 trim_galore_unzipped_PE
3
有多种方法可以使其更加通用,但由于您声明并使用 YAML 配置来构建大部分文件名,因此我将避免在答案中讨论它。只是说这是可能的并且有点令人鼓舞。
“--paired {input}”将扩展以提供这两个文件。由于 for 循环,1 总是在 2 之前。