参考文章:
使用MACS2进行差异peak分析
重点推荐:Call differential binding events
MACS2作为使用最广泛的peak calling软件,在v2版本中添加了差异peak分析的功能,它的子命令功能如下:
1. 软件说明
usage: macs2 bdgdiff [-h] --t1 T1BDG --t2 T2BDG --c1 C1BDG --c2 C2BDG
[-C CUTOFF] [-l MINLEN] [-g MAXGAP] [--d1 DEPTH1]
[--d2 DEPTH2] [--outdir OUTDIR]
(--o-prefix OPREFIX | -o OFILE OFILE OFILE)
optional arguments:
2. 操作记录
Step 1: Generate pileup tracks using callpeak module
- 使用predictd模式对不同样品的片段长度进行预测,以设定相同的extsize大小。
操作记录如下:
(base) zexing@DNA:~/projects/daizhongye/ChIP_seq/2020_10_29/bam_sort$ macs2 predictd -i Scr.bam.sort
INFO @ Fri, 06 Nov 2020 16:24:45:
INFO @ Fri, 06 Nov 2020 16:24:45:
INFO @ Fri, 06 Nov 2020 16:24:45: Detected format is: BAM
INFO @ Fri, 06 Nov 2020 16:24:45: * Input file is gzipped.
INFO @ Fri, 06 Nov 2020 16:24:50: 1000000
INFO @ Fri, 06 Nov 2020 16:24:56: 2000000
INFO @ Fri, 06 Nov 2020 16:25:01: 3000000
INFO @ Fri, 06 Nov 2020 16:25:07: 4000000
INFO @ Fri, 06 Nov 2020 16:25:12: 5000000
INFO @ Fri, 06 Nov 2020 16:25:16: 5565504 reads have been read.
INFO @ Fri, 06 Nov 2020 16:25:16: tag size is determined as 150 bps
INFO @ Fri, 06 Nov 2020 16:25:16:
INFO @ Fri, 06 Nov 2020 16:25:16:
INFO @ Fri, 06 Nov 2020 16:25:16:
INFO @ Fri, 06 Nov 2020 16:25:16:
INFO @ Fri, 06 Nov 2020 16:25:19:
INFO @ Fri, 06 Nov 2020 16:25:19: start model_add_line...
INFO @ Fri, 06 Nov 2020 16:25:19: start X-correlation...
INFO @ Fri, 06 Nov 2020 16:25:19: end of X-cor
INFO @ Fri, 06 Nov 2020 16:25:19:
INFO @ Fri, 06 Nov 2020 16:25:19:
INFO @ Fri, 06 Nov 2020 16:25:19:
INFO @ Fri, 06 Nov 2020 16:25:19:
(base) zexing@DNA:~/projects/daizhongye/ChIP_seq/2020_10_29/bam_sort$ macs2 predictd -i shTgm1-2.bam.sort shTgm2-1.bam.sort
INFO @ Fri, 06 Nov 2020 16:28:59:
INFO @ Fri, 06 Nov 2020 16:28:59:
INFO @ Fri, 06 Nov 2020 16:28:59: Detected format is: BAM
INFO @ Fri, 06 Nov 2020 16:28:59: * Input file is gzipped.
INFO @ Fri, 06 Nov 2020 16:29:04: 1000000
INFO @ Fri, 06 Nov 2020 16:29:09: 2000000
INFO @ Fri, 06 Nov 2020 16:29:15: 3000000
INFO @ Fri, 06 Nov 2020 16:29:20: 4000000
INFO @ Fri, 06 Nov 2020 16:29:25: 5000000
INFO @ Fri, 06 Nov 2020 16:29:30: 6000000
INFO @ Fri, 06 Nov 2020 16:29:36: 6730717 reads have been read.
INFO @ Fri, 06 Nov 2020 16:29:36: Detected format is: BAM
INFO @ Fri, 06 Nov 2020 16:29:36: * Input file is gzipped.
INFO @ Fri, 06 Nov 2020 16:29:41: 1000000
INFO @ Fri, 06 Nov 2020 16:29:46: 2000000
INFO @ Fri, 06 Nov 2020 16:29:51: 3000000
INFO @ Fri, 06 Nov 2020 16:29:57: 4000000
INFO @ Fri, 06 Nov 2020 16:30:02: 5000000
INFO @ Fri, 06 Nov 2020 16:30:07: 5895422 reads have been read.
INFO @ Fri, 06 Nov 2020 16:30:07: tag size is determined as 150 bps
INFO @ Fri, 06 Nov 2020 16:30:07:
INFO @ Fri, 06 Nov 2020 16:30:07:
INFO @ Fri, 06 Nov 2020 16:30:07:
INFO @ Fri, 06 Nov 2020 16:30:07:
INFO @ Fri, 06 Nov 2020 16:30:11:
INFO @ Fri, 06 Nov 2020 16:30:11: start model_add_line...
INFO @ Fri, 06 Nov 2020 16:30:12: start X-correlation...
INFO @ Fri, 06 Nov 2020 16:30:12: end of X-cor
INFO @ Fri, 06 Nov 2020 16:30:12:
INFO @ Fri, 06 Nov 2020 16:30:12:
INFO @ Fri, 06 Nov 2020 16:30:12:
INFO @ Fri, 06 Nov 2020 16:30:12:
对于大多数的转录因子chip_seq数据,推荐值为200, 对于大部分组蛋白修饰的chip_seq数据,推荐值为147,本次预测的均值为(275+258)/2=260bp。
-
根据上边预测的插入片段长度进行peak calling,并提取每个样本过滤之后的reads数目(使用-B参数,不能使用–SPMR参数)
vim 新建macs2_bdgdiff_callpeak_script脚本如下:
#! /bin/bash
dir=/f/xudonglab/zexing/projects/daizhongye/ChIP_seq/2020_10_29/macs2_bdgdiff
for i in Scr shTgm1-2 shTgm2-1
do
macs2 callpeak -t /f/xudonglab/zexing/projects/daizhongye/ChIP_seq/2020_10_29/bam_sort/${i}.bam.sort --outdir ${dir}/ -n ${i} -q 0.05 -g mm -B --nomodel --extsize 260
egrep "tags after filtering in treatment|tags after filtering in control" ${dir}/${i}_peaks.xls > ${dir}/${i}_tags
done
后台运行macs2_bdgdiff_callpeak_script脚本如下:
(base) zexing@DNA:~/projects/daizhongye/ChIP_seq/2020_10_29/scripts_log$ nohup bash macs2_bdgdiff_callpeak_script > macs2_bdgdiff_callpeak_script_log &
Step 2: Call differential regions
vim 新建macs2_bdgdiff_script脚本如下:
#! /bin/bash
dir=/f/xudonglab/zexing/projects/daizhongye/ChIP_seq/2020_10_29/macs2_bdgdiff
macs2 bdgdiff --t1 ${dir}/Scr_treat_pileup.bdg --c1 ${dir}/Scr_control_lambda.bdg --d1 4236187 \
--t2 ${dir}/shTgm1-2_treat_pileup.bdg --c2 ${dir}/shTgm1-2_control_lambda.bdg --d2 5103633 \
-g 60 -l 260 --o-prefix ${dir}/diff_Scr_vs_shTgm1-2
macs2 bdgdiff --t1 ${dir}/Scr_treat_pileup.bdg --c1 ${dir}/Scr_control_lambda.bdg --d1 4236187 \
--t2 ${dir}/shTgm2-1_treat_pileup.bdg --c2 ${dir}/shTgm2-1_control_lambda.bdg --d2 4491626 \
-g 60 -l 260 --o-prefix ${dir}/diff_Scr_vs_shTgm2-1
后台运行macs2_bdgdiff_script脚本如下:
(base) zexing@DNA:~/projects/daizhongye/ChIP_seq/2020_10_29/scripts_log$ nohup bash macs2_bdgdiff_script > macs2_bdgdiff_script_log &
其中-d1和-d2的值就是第二步运行时输出的reads数目,-o参数指定输出文件的前缀。运行成功后,会产生3个文件
diff_Scr_vs_shTgm1-2_c3.0_cond1.bed # 保存的是在Scr中上调的peak即:shTgm1-2后下调的peak。
diff_Scr_vs_shTgm1-2_c3.0_cond2.bed # 保存的是在shTgm1-2中上调的peak。
diff_Scr_vs_shTgm1-2_c3.0_common.bed # 保存的是没有达到阈值的,非显著差异peak。
上述3个文件格式是完全相同的,最后一列的内容为log10 likehood ratio值,用来衡量两个条件之间的差异,默认阈值为3,大于阈值的peak为组间差异显著的peak, 这个阈值可以通过-c参数进行调整。
Step 3: Annotate peaks from macs2_bdgdiff
此部分使用上一步生成的bed文件在本地机Rstudio中,利用ChIPseeker进行操作
利用下述脚本对提取出来的cond1.bed和cond2.bed进行注释
参考文章:CHIP-seq流程学习笔记(6)-peak注释软件ChIPseeker
setwd("G:/daizhongye/ChIP-seq/2020_10_29/macs2_bdgdiff/")
library(ChIPseeker)
Scr_vs_shTgm1_2_cond1 <- readPeakFile("G:/daizhongye/ChIP-seq/2020_10_29/macs2_bdgdiff/diff_Scr_vs_shTgm1-2_c3.0_cond1.bed")
Scr_vs_shTgm1_2_cond2 <- readPeakFile("G:/daizhongye/ChIP-seq/2020_10_29/macs2_bdgdiff/diff_Scr_vs_shTgm1-2_c3.0_cond2.bed")
Scr_vs_shTgm2_1_cond1 <- readPeakFile("G:/daizhongye/ChIP-seq/2020_10_29/macs2_bdgdiff/diff_Scr_vs_shTgm2-1_c3.0_cond1.bed")
Scr_vs_shTgm2_1_cond2 <- readPeakFile("G:/daizhongye/ChIP-seq/2020_10_29/macs2_bdgdiff/diff_Scr_vs_shTgm2-1_c3.0_cond2.bed")
require(TxDb.Mmusculus.UCSC.mm10.knownGene)
txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene
Scr_vs_shTgm1_2_cond1_peakAnno <- annotatePeak(Scr_vs_shTgm1_2_cond1, tssRegion = c(-3000, 3000), TxDb = txdb)
Scr_vs_shTgm1_2_cond2_peakAnno <- annotatePeak(Scr_vs_shTgm1_2_cond2, tssRegion = c(-3000, 3000), TxDb = txdb)
Scr_vs_shTgm2_1_cond1_peakAnno <- annotatePeak(Scr_vs_shTgm2_1_cond1, tssRegion = c(-3000, 3000), TxDb = txdb)
Scr_vs_shTgm2_1_cond2_peakAnno <- annotatePeak(Scr_vs_shTgm2_1_cond2, tssRegion = c(-3000, 3000), TxDb = txdb)
write.table(Scr_vs_shTgm1_2_cond1_peakAnno, file = "Scr_vs_shTgm1_2_cond1_peakAnno.csv",sep = ',', quote = TRUE, row.names = FALSE)
write.table(Scr_vs_shTgm1_2_cond2_peakAnno, file = "Scr_vs_shTgm1_2_cond2_peakAnno.csv",sep = ',', quote = TRUE, row.names = FALSE)
write.table(Scr_vs_shTgm2_1_cond1_peakAnno, file = "Scr_vs_shTgm2_1_cond1_peakAnno.csv",sep = ',', quote = TRUE, row.names = FALSE)
write.table(Scr_vs_shTgm2_1_cond2_peakAnno, file = "Scr_vs_shTgm2_1_cond2_peakAnno.csv",sep = ',', quote = TRUE, row.names = FALSE)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)