DNA re-sequencing analysis PřF: Bi7420 Vojta Bystry vojtech.bystry@ceitec.muni.cz NGS data analysis 2 2 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC > Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … DNA re-sequencing 3 •Variant Calling ‒Medical purposes (molecular medicine) •Cancer genomics • •Small variants (SNV + small indels) vs. Structural Variants • •Germline vs. Somatic • Mapping 4 •Computationally most demanding •More or less standardized •Output .bam ‒.bam = binary (ziped) .sam ‒.sam = Sequence Alignment Map • •Tools ‒BWA - DNA ‒STAR - RNA DNA re-sequencing Mapping QC 5 A picture containing chart Description automatically generated Mapping QC 6 Table Description automatically generated Mapping QC - coverage 7 A picture containing line chart Description automatically generated Chart, line chart Description automatically generated Mapping QC – cumulative coverage 8 Line chart Description automatically generated with medium confidence Chart Description automatically generated with low confidence Mapping QC 9 Graphical user interface, text, application Description automatically generated Mapping QC 10 Chart, line chart Description automatically generated Graphical user interface, application Description automatically generated Small Variant calling Name of the presentation 11 Variant Calling - Germline 12 •What you have from birth •Family trio sequencing •Predispositions ‒ ‒ ‒ ‒ ‒ Variant Calling - Germline 13 •What you have from birth •Family trio sequencing •Predispositions ‒ ‒ ‒ ‒ ‒ Variant Calling - Somatic 14 •Diagnostics / prognostic / therapy decision •Tumor – normal paired ‒Somatic variant calling without normal needs high coverage (200x >) •not all germline variants will be filtered •Expected variant heterogeneity •Expected variant allelic frequency (VAF) ‒Histopathology prediction overestimate tumor load ‒Negative correlation to the necessary coverage ‒ ‒ ‒ ‒ ‒ Variant Calling - Somatic Name of the presentation 15 •Multiple tools: ‒strelka2, verdict, mutect2, somaticsniper, lofreq, muse, varscan •Ensemble caller ‒SomaticSeq ‒Use machine learning to detect TP from FP •Sensitivity vs. specificity ‒Preferred sensitivity ‒Preferred accuracy for derived information • ‒ ‒ ‒ ‒ ‒ Small Variant annotation Name of the presentation 16 •VEP – variant effect predictor •Transcript ”selection” ‒Refseq vs. ensemble •Population frequency ‒1000 genome project ‒Gnomad •Many clinical variant DBs ‒Gene based vs. variant based ‒snpDB ‒COSMIC ‒clinvar ‒CGC Small Variant annotation – functional prediction Name of the presentation 17 •General variant consequence ‒Based on the position ‒Impact •Effect of the variant on protein structure ‒PolyPhen ‒SIFT ‒ • Screenshot 2017-06-13 19.53.57.png Small Variant interpretation Name of the presentation 18 • •Hardest part •Clinical interpretation ‒Usually manual work •Clinical genetics •Select probable causal variant ‒Select few from ~1000 ‒Bioinformatics can help • •Quantitative interpretation ‒Clinical classification •Breast cancer subtypes classification • Variant interpretation – gene networks Name of the presentation 19 • •Gene ontology •Biological pathway DB ‒KEGG ‒Reactome ‒WikiPathways • Variant interpretation – derived informations Name of the presentation 20 •Tumor mutational burden ‒Several definitions ‒Mutations per million bases •Mutational Signatures ‒COSMIC ‒exposure to ultraviolet light ‒Tabacco smoking ‒Defective DNA damage repair 21 www.ceitec.eu CEITEC @CEITEC_Brno Vojta Bystry vojtech.bystry@ceitec.muni.cz Thank you for your attention! >