您的函数此处不使用通配符。它应该是这样的:
def samtools_merge_inputs(wildcards):
files = expand(wildcards.run_id+'_L{line}_realigned.bam', line=LINES)
return files
当然,如果您在所有泳道上都有所有样本。当您调用函数时,所有通配符都会作为对象传递到wildcards
你的函数的参数。
您还可以这样做:
files = expand('{run_id}_L{line}_realigned.bam', run_id=wildcards.run_id, line=LINES)
你有很多东西在你的蛇文件中不起作用。
首先,您的 samtools 合并规则中缺少“'”:
rule samtools_merge:
input:
samtools_merge_inputs
output:
'{run_id}_realigned.bam'<-----
shell:
'samtools merge -h {input} {output}'
并注意变量名称(LINE 与 LINES)
二、功能glob_wildcards()
将返回找到的所有值的列表,这意味着您的两个变量将如下:
RUN_ID, LINES = glob_wildcards('{run_id}_L{line}_realigned.bam')
print(RUN_ID)
['OVCA-2-FRESH-1_S16', 'OVCA-2-FRESH-1_S16', 'OVCA-1-FRESH-1_S16', 'OVCA-1-FRESH-1_S16', 'OVCA-1-FRESH-1_S16', 'OVCA-1-FRESH-1_S16', 'OVCA-2-FRESH-1_S16', 'OVCA-2-FRESH-1_S16']
print(LINES)
['002', '001', '001', '002', '004', '003', '003', '004']
我确信这不是你想要的。解决方案是使用正确的结构来描述您的样本。例如(同样,如果所有样本都位于所有泳道上):
RUN_ID = ["OVCA-1-FRESH-1_S16","OVCA-2-FRESH-1_S16"]
LINES = ["1","2","3","4"]
最后一件事:您的输入和输出无法通过通配符区分,这意味着您最终会遇到错误Cyclic dependency on rule samtools_merge
or RecursionError: maximum recursion depth exceeded in comparison
。我建议您为输出选择不同的名称。全部放在一起:
# Map start input files
RUN_ID = ["OVCA-1-FRESH-1_S16","OVCA-2-FRESH-1_S16"]
LINES = ["001","002","003","004"]
rule all:
input:
expand('{run_id}_realignedFinal.bam', run_id=RUN_ID)
# Map input files for merging. This function should collect all
# BAM files that match the {run_id} wildcard.
def samtools_merge_inputs(wildcards):
files = expand('{run_id}_L{line}_realigned.bam', run_id=wildcards.run_id, line=LINES)
return files
# Perform BAM merging.
rule samtools_merge:
input:
samtools_merge_inputs
output:
'{run_id}_realignedFinal.bam'
shell:
'samtools merge -h {input} {output}'
还没有检查你的 shell 命令,但我的文档说:
Usage: samtools merge [-nurlf] [-h inh.sam] [-b <bamlist.fofn>] <out.bam> <in1.bam> [<in2.bam> ... <inN.bam>]