Adobe Systems Adobe Systems Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno, 22.4.2015 Deep-seq Workflow/Pipeline workflow_deep_seq.png GATK_best_practices.png GATK Workflow/Pipeline https://www.broadinstitute.org/gatk/guide/best-practices Raw sequence = fastq •Biological sequence •Corresponding quality scores •ASCII character •(fasta+ qual, csfasta + csqual, sff) @ SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 FastQC Cutadapt •Adaptor trimming (miRNA) •Quality filtering •Length filtering Read mapping => SAM, BAM •Usually mapping reads on reference •miRNA - special case –Grouping and annotate against mirBase •DNA –BWA, Bowtie, Bfast, SHRiMP, CLC •RNA –TopHat (de novo splice aligner) •Commercial –CLC Genomics Workbench •De novo assembly – unknown genomes – Alignment SAM Mapping, Coverage reports •Important checkout for lab protocol •Specificity of PCR •Normalization •Settings of variant calling threshold, CNV SNV and small InDel Calling •Coverage •Frequency •Base quality •!!! •Genomic context (homopolymers) •Nucleotide type •Position in read (errors at the read end) •Alignment errors Structural variations •Mate-pair library •Long InDel •Translocation Annotating and filtering •Gene •Transcript •dbSNP •Regulation •Comparative genomics •Repeats •Functional •Gene ontology •miRNA targets •Etc.