1.本文主要是针对samtools view 命令的用法和相关参数的说明。

2.view命令的主要功能是查看bam和sam文件的内容。

3.view命令的用法和常用参数。

samtools view -h
samtools view: No input provided or missing option argument.

Usage: samtools view [options] <in.bam>|<in.sam>|<in.cram> [region ...]
#默认情况下不加region,则是输出所有的region。
Options:
  -b       output BAM  #默认下输出是sam格式文件,该参数设置输出bam格式
  -C       output CRAM (requires -T)
  -1       use fast BAM compression (implies -b)
  -u       uncompressed BAM output (implies -b)
  -h       include header in SAM output  #默认输出sam不带header,该参数设置带header信息
  -H       print SAM header only (no alignments)  #只输出head部分
  -c       print only the count of matching records
  -o FILE  output file name [stdout] #将结果输出到文件中,默认输出到标准输出。
  -U FILE  output reads not selected by filters to FILE [null]
  -t FILE  FILE listing reference names and lengths (see long help) [null]
  -L FILE  only include reads overlapping this BED FILE [null]
  -r STR   only include reads in read group STR [null]
  -R FILE  only include reads with read group listed in FILE [null]
  -q INT   only include reads with mapping quality >= INT [0]  #允许的最小比对质量
  -l STR   only include reads in library STR [null]
  -m INT   only include reads with number of CIGAR operations consuming
           query sequence >= INT [0]
  -f INT   only include reads with all  of the FLAGs in INT present [0]  #比对结果中必须包含的flag
  -F INT   only include reads with none of the FLAGS in INT present [0]
  -G INT   only EXCLUDE reads with all  of the FLAGs in INT present [0]
  -s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the
           fraction of templates/read pairs to keep; INT part sets seed)
  -M       use the multi-region iterator (increases the speed, removes
           duplicates and outputs the reads as they are ordered in the file)
  -x STR   read tag to strip (repeatable) [null]
  -B       collapse the backward CIGAR operation
  -?       print long help, including note about region specification
  -S       ignored (input format is auto-detected)
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
  -T, --reference FILE
               Reference sequence FASTA FILE [null]  #使用序列fasta文件作为header的输入
  -@, --threads INT
               Number of additional threads to use [0]

4.将sam文件转换成bam文件

samtools view -bS sort.sam > sort.bam
samtools view -b -S sort.sam > sort.bam

5.提取比对到参考序列上的比对结果。

samtools view -bF 4 sort.bam > sort.F.bam

提取没有比对到参考序列上的比对结果
samtools view -bF 4 sort.bam > sort.bam

6.提取scaffold1上能比对到30k到100k区域的比对结果

samtools view sort.bam scaffold1:30000-100000 > scaffold1_30k-100k.sam

7.提取部分染色体的bam信息。

samtools view -b -h A.bam chr2 chr3 chr5 >A_others.bam
以空格分隔要截取的染色体数据,这样,得到比对到chr2,chr3,和chr5的部分bam文件,保存在A_others.bam文件中。

8.samtools faidx 能够对fasta 序列建立一个后缀为.fai 的文件,根据这个.fai 文件和原始的fasta文件, 能够快速的提取任意区域的序列。

samtools faidx reference.fasta

9.查看head信息。

samtools view -H *bam  ##bam文件的head信息

 

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐