NGS data analysis introduction Bi7420: Moderní metody pro analýzu genomu Vojta Bystry vojtech.bystry@ceitec.muni.cz Plan for Bi7420 2 •NGS data analysis for non-bioinformatics ‒Focus on experiment planning and result interpretation ‒ 1.Introduction to NGS technology 2.Introduction analysis; NGS Overview 3.DNA resequencing 4.miRNA, lncRNA in cancer - Marek Mráz 5.DNA resequencing, Chip-seq (CLIP-seq) 6.RNA-seq 7.RNA-seq – single cell sequencing •The plan is open to change What is NGS? 3 •Next generation sequencing ‒New generation sequencing ‒HTP = High throughput ‒Massively parallel sequencing •Contrast to Sanger sequencing What is NGS? 4 •Illumina – sequencing by synthesis •Oxford Nanopore – Nanopore sequencing •Pacific Bioscience - Single Molecule, Real-Time (SMRT) • What is NGS? 5 •Illumina – sequencing by synthesis •Oxford Nanopore – Nanopore sequencing •Pacific Bioscience - Single Molecule, Real-Time (SMRT) • Raw data 6 •10^5 – 10^10 reads •75 – 300Bp •Could be pair-end Basic workflow 7 Chimestry_flask.png Experimental design Library preparation Sequencing Data analysis Basic workflow 8 Chimestry_flask.png Experimental design Library preparation Sequencing Data analysis How we sequence What we sequence Why we sequence Basic workflow 9 Chimestry_flask.png Experimental design Library preparation Sequencing Data analysis How we sequence What we sequence Why we sequence Consultation regarding data analysis is highly advisable. NGS library preparation 10 Living material DNA Select some parts RNA NGS library overview 11 A screen shot of a computer Description automatically generated NGS data analysis 12 12 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing > Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … NGS data analysis 13 13 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing > Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … Metagenomics 14 •Environmental statistics about populations ‒alpha, beta, gamma diversity ‒identify known bacterial species ‒eventually functional profiling •E.g. antimicrobial resistance genes •Sequencing techniques ‒16S rRNA sequencing ‒Shotgun metagenomic sequencing • Metagenomics – 16S rRNA vs. Shotgun 15 Factors 16S rRNA sequencing Shotgun Metagenomic Sequencing Cost ~$50 USD Starting at ~$150 but price will depend on sequencing depth required Sample preparation Similar complexity to shotgun sequencing Similar complexity to 16S rRNA sequencing Functional profiling (profile microbial genes) No (but ‘predicted’ functional profiling is possible) Yes (but it only reveals information on functional potential) Taxonomic resolution: Genus, species, strain? Bacterial genus (sometimes species); dependent on region(s) targeted Bacterial species (sometimes strains and single nucleotide variants, if sequencing is deep enough) Taxonomic coverage Bacteria and archaea All taxa, including viruses Bioinformatics requirements Beginner to intermediate expertise Intermediate to advanced expertise Databases Established, well-curated Relatively new, still growing Sensitivity to host DNA contamination Low (but PCR success depends on the absence of inhibitors and the presence of a detectable microbiome) High , varies with sample type (but this can be mitigated by calibrating the sequencing depth) Bias Medium to high (retrieved taxonomic composition is dependent on selected primers and targeted variable region) Lower (while metagenomics is “untargeted”, experimental and analytical biases can be introduced at various stages) Metagenomics – 16S rRNA vs. Shotgun 16 •Study Examples ‒Assessment of the bacterial microbiome of Amazonian soil •16S rRNA sequencing may provide more taxonomic resolution ‒Changes in microbiome composition and antimicrobial gene carriage following fecal transplant •shotgun sequencing to assess both compositional and functional differences ‒Daily fluctuations in gut microbiome following 2 week dietary fiber intervention •shotgun sequencing to assess both compositional and functional differences NGS data analysis 17 17 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing > Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … Reference Assembly 18 Reference Assembly 19 Reference Assembly 20 •Genome – DNA – very hard and costly •Transcriptome – RNA •Multiple sequencing types highly beneficial ‒Pair-end ‒Long reads ‒Mate-pairs •Similar reference helpful – assembly by homology NGS data analysis 21 21 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing > Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … Immunogenetic •T-cell receptor , Immunoglobulin – (B-cell) •Gene rearrangement during cell maturation ‒VDJ recombination 22 Immunogenetic •T-cell receptor , Immunoglobulin – (B-cell) •Gene rearrangement during cell maturation ‒VDJ recombination 23 Immunogenetic •Different cell populations ‒Clonal studies ‒Repertoire usage •Main usage – blood malignancies (leukemias) 24 A close up of a piece of paper Description automatically generated NGS data analysis 25 25 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing > Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … How to how to use CRISPR Libraries for Screening-GenScript丨CRISPR/Cas9 Applications Genome-wide CRISPR-Cas9 knockout screens •Cas9 (CRISPR associated protein 9) is a protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses •sgRNA libraries ‒Each sgRNA knockout specific gene ‒76,000 guide RNAs (sgRNAs) with four highly active guides per gene, targeting about 19,000 genes as well as non-targeting sgRNA controls 26 Lentivirus Genome-wide CRISPR-Cas9 knockout screens •Screen selection + expansion/enrichment of surviving cells •NGS sequencing • 27 Genome-wide CRISPR-Cas9 knockout screens •NGS data analysis ‒Counting cells with different genes KD ‒Counting sgRNA fragments ‒Compare conditions • • 28 Genome-wide CRISPR-Cas9 knockout screens •Example study • • 29 figure1 Wei, L., Lee, D., Law, CT. et al. Genome-wide CRISPR/Cas9 library screening identified PHGDH as a critical driver for Sorafenib resistance in HCC. Nat Commun 10, 4681 (2019). https://doi.org/10.1038/s41467-019-12606-7 NGS data analysis 30 30 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC > Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … De-multiplexing 31 De-multiplexing 32 •Bcl2fastq tool ‒Needs sample sheet with indexes ‒Number of barcode mismatches •Check undetermined Primary data – fastq file 33 Fastq format - quality 34 •Fastq - q stands for quality – coded phred score • • Quality Error probability 5 31% 10 10% 20 1% 30 0.1% •Very good for early problem detection •Reasonable for trimming and read filtering •RNA seq - above phred score 5 • CFFFFEFFGCEEGECFGGGGAFF87@E:++6C<++3:,8,33,,:,,,:,,:,,, Fastq – quality control 35 •Fastqc - tool • • A screenshot of a computer Description automatically generated A screenshot of a computer Description automatically generated FastQC Report 36 Timeline Description automatically generated Table Description automatically generated 37 www.ceitec.eu CEITEC @CEITEC_Brno Vojta Bystry vojtech.bystry@ceitec.muni.cz Thank you for your attention! >