samtools的学习之路(一)
1.本文主要是针对samtools view 命令的用法和相关参数的说明。2.view命令的主要功能是查看bam和sam文件的内容。3.view命令的用法和常用参数。samtools view -hsamtools view: No input provided or missing option argument.Usage: samtools view [options] ...
·
1.本文主要是针对samtools view 命令的用法和相关参数的说明。
2.view命令的主要功能是查看bam和sam文件的内容。
3.view命令的用法和常用参数。
samtools view -h
samtools view: No input provided or missing option argument.
Usage: samtools view [options] <in.bam>|<in.sam>|<in.cram> [region ...]
#默认情况下不加region,则是输出所有的region。
Options:
-b output BAM #默认下输出是sam格式文件,该参数设置输出bam格式
-C output CRAM (requires -T)
-1 use fast BAM compression (implies -b)
-u uncompressed BAM output (implies -b)
-h include header in SAM output #默认输出sam不带header,该参数设置带header信息
-H print SAM header only (no alignments) #只输出head部分
-c print only the count of matching records
-o FILE output file name [stdout] #将结果输出到文件中,默认输出到标准输出。
-U FILE output reads not selected by filters to FILE [null]
-t FILE FILE listing reference names and lengths (see long help) [null]
-L FILE only include reads overlapping this BED FILE [null]
-r STR only include reads in read group STR [null]
-R FILE only include reads with read group listed in FILE [null]
-q INT only include reads with mapping quality >= INT [0] #允许的最小比对质量
-l STR only include reads in library STR [null]
-m INT only include reads with number of CIGAR operations consuming
query sequence >= INT [0]
-f INT only include reads with all of the FLAGs in INT present [0] #比对结果中必须包含的flag
-F INT only include reads with none of the FLAGS in INT present [0]
-G INT only EXCLUDE reads with all of the FLAGs in INT present [0]
-s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the
fraction of templates/read pairs to keep; INT part sets seed)
-M use the multi-region iterator (increases the speed, removes
duplicates and outputs the reads as they are ordered in the file)
-x STR read tag to strip (repeatable) [null]
-B collapse the backward CIGAR operation
-? print long help, including note about region specification
-S ignored (input format is auto-detected)
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
-T, --reference FILE
Reference sequence FASTA FILE [null] #使用序列fasta文件作为header的输入
-@, --threads INT
Number of additional threads to use [0]
4.将sam文件转换成bam文件
samtools view -bS sort.sam > sort.bam
samtools view -b -S sort.sam > sort.bam
5.提取比对到参考序列上的比对结果。
samtools view -bF 4 sort.bam > sort.F.bam
提取没有比对到参考序列上的比对结果
samtools view -bF 4 sort.bam > sort.bam
6.提取scaffold1上能比对到30k到100k区域的比对结果
samtools view sort.bam scaffold1:30000-100000 > scaffold1_30k-100k.sam
7.提取部分染色体的bam信息。
samtools view -b -h A.bam chr2 chr3 chr5 >A_others.bam
以空格分隔要截取的染色体数据,这样,得到比对到chr2,chr3,和chr5的部分bam文件,保存在A_others.bam文件中。
8.samtools faidx 能够对fasta 序列建立一个后缀为.fai 的文件,根据这个.fai 文件和原始的fasta文件, 能够快速的提取任意区域的序列。
samtools faidx reference.fasta
9.查看head信息。
samtools view -H *bam ##bam文件的head信息
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
已为社区贡献6条内容
所有评论(0)