mmquant VERONIKA CHALUPOVÁ & HANA BOHÁČOVÁ mmquant • A tool to quantify gene expression • Published on: 15 September 2017 by Matthias Zytnicki Last upload: 8 months and 28 days ago version: 1.3 Language used: C++ Operating system: Linux; Mac OS X How does it work - this tool counts with duplicated genes: if read maps to different positions, corresponding genes are duplicated -> this tool then creates a merged gene - by default, the method supposes that reads have been sorted beforehand - if not: genes are sorted into a vector, cutted into non-overlapping bins and index is given to the first gene in bin; then for each read genes are scaned starting from first gene in bin 1. step of genes quantification = searching for reads matching genes The way a read R is mapped to a gene A depends on the -l n value set by user: - if read is mapped to several locations, the tool sets NH tag of SAM/BAM file to value >1 htseq-count: union / intersection-strict / intersection-nonempty mmquant: -l 1 / -l -1 / no alternative (ambiguous reads are discarded) 2. step of genes quantification = resolving ambiguities - when read matches several genes, some can be discarded depending on number of overlapping base pairs -d n computes the differences of overlapping nucleotides (N_A, N_B). If N_A ≥ N_B + n, then the read will be attributed to gene A only. -D m compares the ratio of overlapping nucleotides. If N_A / N_B ≥ m, then the read will be attributed to gene A only. ◦- featureCounts: option largestOverlap (assigns to the gene read with largest number of overlapping bases) ◦- mmquant: emulates this strategy by –d and –D parameters Fig. 3 Fig. 3 Input Output Compulsory options: ◦annotation file in GTF format ◦reads in BAM/SAM format The output is a tab-separated file. It also provides output stats on hits. Comparison with other tools • time: -the fastest: featureCounts -also fast: mmquant -slowest: htseq-count • number of expressed genes given by each tool is comparable -but multi-mapping genes could provide up to 25% of new genes -without them the results could be biased Thank you for your attention