

RNA‐seq enables a wide range of applications such as the discovery of novel genes, gene/transcript quantification, and differential expression and functional analysis. A complete workflow consists of: (1) experimental design (2) sample and library preparation (3) sequencing and (4) data analysis. There are currently many experimental options available, and a complete comprehension of each step is critical to make right decisions and avoid getting into inconclusive results. Although being a powerful approach, RNA‐seq imposes major challenges throughout its steps with numerous caveats. Using our yeast_pe.sort.bam file, let's do some some quality filtering.RNA‐sequencing (RNA‐seq) is the state‐of‐the‐art technique for transcriptome analysis that takes advantage of high‐throughput next‐generation sequencing. if a read maps to 5 genes, it can be counted as 1/5 for each of the genes

the -k N option says to report up to N alignments for a read.bowtie2can be configured to report more than one location for a mapped read.but they can be filtered from the sorted BAM with -F 0x100 ( secondary alignment flag = 0).there is no way to disable reporting of secondary alignments with bwa mem.because of RNA splicing), the longer alignment is marked as primaryand the shorter as secondary. if one part of a read maps to one location and another part maps somewhere else (e.g.its definition of a secondary alignment is different (and a bit non-standard).bwa mem (local alignment) can always r eport more than one location for a mapped read.bowtie2will report a low mapping quality (bwa aln will always report a 0 mapping quality for these multi-hit reads.if a given read maps equally wellto multiple locations, these aligners pick one location at random.this will be the location with the bestmapping quality and alignment.bwa aln (global alignment) and bowtie2with default parameters (both -local and -global) report at most one location for a read that maps.Here are some examples of how different aligners handle reporting of multi-hit reads and their mapping qualities: true mapping rate = ( pirmarymapped reads) / (total BAM file sequences - secondarymapped reads).this affects how the true mapping rate must be calculated.When secondaryreads are reported, the total number of alignment records in the BAM file is greaterthan the number of reads in the input FASTQ files!.e.g., ChIP-seq peak calling and variant analysis with GATK.While they often provide valuable information, secondaryreads must be filtered for some downstream applications.Alternate locations for a mapped read are are flagged as secondary(flag 0x100).If a read's mapping quality is low (especially if it is zero, or mapQ 0 for short) the read maps to multiple locations on the genome (they are multi-hit or multi-mapping reads), and we can't be sure whether the reported location is the correct one.Īligners also differ in whether they report alternate alignments for multi-hit reads. Mapping qualities are a measure of how likely a given sequence alignment to its reported location is correct. Samtools view -c -F 0x4 -f 0x2 yeast_pe.sort.bam chrI:1000-2000 About mapping quality -F 0x XX – only report alignment records where the specified flags XX are all cleared(are all 0).you can provide the flags in decimal, or as here as hexidecimal.-f 0x XX – only report alignment records where the specified flags XX are all set(are all 1).-q N – only report alignment records with mapping quality of at least N ( >= N).The most common samtools view filtering options are: We focus on this filtering capability in this set of exercises. But samtools view also has options that let you do powerful filtering of the output. We have seen how samtools view can be used to binary-format BAM files into text format for viewing.

Since BAM files are binary, they can't be viewed directly using standard Unix file viewers such as more, lessand head. Filteralignment records based on BAM flags, mapping quality or location ( samtools view).Index BAM files that have been sorted ( samtools index).Sort BAM files by reference coordinates ( samtools sort).Convert text-format SAM files into binary BAM files ( samtools view) and vice versa.There are many sub-commands in this suite, but the most common and useful are: As we have seen, the SAMTools suite allows you to manipulate the SAM/BAM files produced by most aligners.
