Lecture 2 : Raw NGS data quality control Vojta Bystry vojtech.bystry@ceitec.muni.cz Modern methods for genome analysis (PřF:Bi7420) NGS data analysis 22 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq… De-multiplexing 3 De-multiplexing 4 Primary data – fastq file 5 Fastq format - quality 6 • Fastq - q stands for quality – coded as phred score Q = −10⋅log10 P Quality Error probability 5 31% 10 10% 20 1% 30 0.1% • What the machine things is the quality • Only account for sequencing errors • Very good for early problem detection CFFFFEFFGCEEGECFGGGGAFF87@E:++6C<++3:,8,33,,:,,,:,,:,,, Fastq – quality control 7 • How can we summarize this? • What QC can be done? FastQC Report 8 Fastq – quality control 9 • Fastqc - tool 10www.ceitec.eu CEITEC @CEITEC_Brno Vojta Bystry vojtech.bystry@ceitec.muni.cz Thank you for your attention!