ceitec_PPT_podklad_uvod logo+napis_en OPVaVpI_loga-eu_pos_H_EN Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno, 2.4.2014 Workflow • Raw sequence = fastq •Biological sequence •Corresponding quality scores •ASCII character •(fasta+ qual, csfasta + csqual, sff) • •@ •SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT •+ • !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 FastQC • Cutadapt •Adaptor trimming (miRNA) •Quality filtering •Length filtering • • Read mapping => SAM, BAM •Usually mapping reads on reference •miRNA - special case –Grouping and annotate against mirBase •DNA –BWA, Bowtie, Bfast, SHRiMPclc •RNA –TopHat (de novo splice aligner) •Commercial –CLC Genomics Workbench SAM • Alignment • Mapping, Coverage reports •Important checkout for lab protocol •Specificity of PCR •Normalization •Settings of variant calling threshold, CNV • SNV and small InDel Calling •Coverage •Frequency •Base quality • •!!! •Genomic context (homopolymers) •Nucleotide type •Position in read (errors at the read end) •Alignment errors CNV variations • Structural variations •Mate-pair library •Long InDel •Translocation • Annotating and filtering •Gene •Transcript •dbSNP •Regulation •Comparative genomics •Repeats •Functional •Gene ontology •miRNA targets •Etc.