NGS data analysis introduction Bi7420: Moderní metody pro analýzu genomu Vojta Bystry vojtech.bystry@ceitec.muni.cz Plan for Bi7420 2 • Next generation sequencing methods overview ‒ Focus on experiment planning and result interpretation 1. Introduction to NGS technology 2. miRNA, lncRNA in cancer - Marek Mráz 3. Basic QC, DNA resequencing 4. DNA resequencing, Chip-seq (CLIP-seq) 5. Chip-seq (CLIP-seq) 6. RNA-seq 7. Single-cell RNA-seq, Spatial transcriptomics What is NGS? 3 • Next generation sequencing ‒ New generation sequencing ‒ HTP = High throughput ‒ Massively parallel sequencing • Contrast to Sanger sequencing What is NGS? 4 • Illumina – sequencing by synthesis – short-read sequencing • Oxford Nanopore – Nanopore sequencing • Pacific Bioscience - Single Molecule, Real-Time (SMRT) What is NGS? 5 • Illumina – sequencing by synthesis – short-read sequencing • Oxford Nanopore – Nanopore sequencing • Pacific Bioscience - Single Molecule, Real-Time (SMRT) What is NGS? 6 • Illumina – sequencing by synthesis – short-read sequencing • Oxford Nanopore – Nanopore sequencing • Pacific Bioscience - Single Molecule, Real-Time (SMRT) Short-read sequencing result 7 • 10^5 – 10^10 reads • 75 – 300Bp • Could be pair-end NGS experiment workflow 8 Experimental design Library preparation Sequencing Data analysis NGS experiment workflow 9 Experimental design Library preparation Sequencing Data analysis How we sequenceWhat we sequenceWhy we sequence NGS experiment workflow 10 Experimental design Library preparation Sequencing Data analysis How we sequenceWhat we sequenceWhy we sequence Consultation regarding data analysis is highly advisable. NGS library preparation - What we sequence 11 Biological material DNA Select some parts RNA (cDNA) Note on a direct RNA sequencing using Oxford nanopore NGS library overview 12 NGS data analysis 1313 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq… NGS data analysis 1414 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq… Metagenomics 15 Metagenomics results 16 • Environmental statistics about populations ‒ alpha, beta, gamma diversity Metagenomics results 17 • Environmental statistics about populations ‒ identify known bacterial species • taxonomy profiling ‒ eventually functional profiling • E.g. antimicrobial resistance genes Metagenomics results 18 • Environmental statistics about populations ‒ identify known bacterial species • taxonomy profiling ‒ eventually functional profiling • E.g. antimicrobial resistance genes • Sequencing techniques ‒ 16S rRNA sequencing ‒ Shotgun metagenomic sequencing Metagenomics – 16S rRNA vs. Shotgun 19 Factors 16S rRNA sequencing Shotgun Metagenomic Sequencing Cost ~$50 USD Starting at ~$150 but price will depend on sequencing depth required Sample preparation Similar complexity to shotgun sequencing Similar complexity to 16S rRNA sequencing Functional profiling (profile microbial genes) No (but ‘predicted’ functional profiling is possible) Yes (but it only reveals information on functional potential) Taxonomic resolution: Genus, species, strain? Bacterial genus (sometimes species); dependent on region(s) targeted Bacterial species (sometimes strains and single nucleotide variants, if sequencing is deep enough) Taxonomic coverage Bacteria and archaea All taxa, including viruses Bioinformatics requirements Beginner to intermediate expertise Intermediate to advanced expertise Databases Established, well-curated Relatively new, still growing Sensitivity to host DNA contamination Low (but PCR success depends on the absence of inhibitors and the presence of a detectable microbiome) High , varies with sample type (but this can be mitigated by calibrating the sequencing depth) Bias Medium to high (retrieved taxonomic composition is dependent on selected primers and targeted variable region) Lower (while metagenomics is “untargeted”, experimental and analytical biases can be introduced at various stages) Metagenomics – 16S rRNA vs. Shotgun 20 • Study Examples ‒ Assessment of the bacterial microbiome of Amazonian soil Metagenomics – 16S rRNA vs. Shotgun 21 • Study Examples ‒ Assessment of the bacterial microbiome of Amazonian soil • 16S rRNA sequencing may provide more taxonomic resolution Metagenomics – 16S rRNA vs. Shotgun 22 • Study Examples ‒ Assessment of the bacterial microbiome of Amazonian soil • 16S rRNA sequencing may provide more taxonomic resolution ‒ Changes in microbiome composition and antimicrobial gene carriage following fecal transplant Metagenomics – 16S rRNA vs. Shotgun 23 • Study Examples ‒ Assessment of the bacterial microbiome of Amazonian soil • 16S rRNA sequencing may provide more taxonomic resolution ‒ Changes in microbiome composition and antimicrobial gene carriage following fecal transplant • shotgun sequencing to assess both compositional and functional differences Metagenomics – 16S rRNA vs. Shotgun 24 • Study Examples ‒ Assessment of the bacterial microbiome of Amazonian soil • 16S rRNA sequencing may provide more taxonomic resolution ‒ Changes in microbiome composition and antimicrobial gene carriage following fecal transplant • shotgun sequencing to assess both compositional and functional differences ‒ Daily fluctuations in gut microbiome following 2 week dietary fiber intervention Metagenomics – 16S rRNA vs. Shotgun 25 • Study Examples ‒ Assessment of the bacterial microbiome of Amazonian soil • 16S rRNA sequencing may provide more taxonomic resolution ‒ Changes in microbiome composition and antimicrobial gene carriage following fecal transplant • shotgun sequencing to assess both compositional and functional differences ‒ Daily fluctuations in gut microbiome following 2 week dietary fiber intervention • shotgun sequencing or 16S rRNA ‒ assess both compositional and functional differences ‒ cheaper and in this case can use ‘predicted’ functional profiling NGS data analysis 2626 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq… Reference Assembly 27 Reference Assembly 28 Reference Assembly problematic with short read 29 Genome Assembly 30 • Very hard and costly (in eukaryota) • Multiple sequencing types needed ‒ Pair-end short reads ‒ Long reads ‒ Mate-pairs (e.g. Hi-C) Genome Assembly 31 • Very hard and costly (in eukaryota) • Multiple sequencing types needed ‒ Pair-end short reads ‒ Long reads ‒ Mate-pairs (e.g. Hi-C) T2T-CHM13 Transcriptome Assembly 32 • Assemble RNA fragments ‒ Similar reference helpful • Genome guided assembly ‒ Good for poorly annotated organisms with known genomic reference NGS data analysis 3333 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq… Immunogenetic • T-cell receptor , Immunoglobulin – (B-cell) • Gene rearrangement during cell maturation ‒ VDJ recombination 34 Immunogenetic • T-cell receptor , Immunoglobulin – (B-cell) • Gene rearrangement during cell maturation ‒ VDJ recombination 35 Immunogenetic • Different cell populations ‒ Clonal studies ‒ Repertoire usage • Main usage – blood malignancies (leukemias) 36 NGS data analysis 3737 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq… Genome-wide CRISPR-Cas9 knockout screens • Cas9 (CRISPR associated protein 9) is a protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses • sgRNA libraries ‒ Each sgRNA knockout specific gene ‒ 76,000 guide RNAs (sgRNAs) with four highly active guides per gene, targeting about 19,000 genes as well as non-targeting sgRNA controls 38 Lentivirus Genome-wide CRISPR-Cas9 knockout screens • Screen selection + expansion/enrichment of surviving cells • NGS sequencing 39 Genome-wide CRISPR-Cas9 knockout screens • NGS data analysis ‒ Counting cells with different genes KD ‒ Counting sgRNA fragments ‒ Compare conditions 40 Genome-wide CRISPR-Cas9 knockout screens • Example study 41 Wei, L., Lee, D., Law, CT. et al. Genome-wide CRISPR/Cas9 library screening identified PHGDH as a critical driver for Sorafenib resistance in HCC. Nat Commun 10, 4681 (2019). https://doi.org/10.1038/s41467-019-12606-7 NGS data analysis 4242 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq… 43www.ceitec.eu CEITEC @CEITEC_Brno Vojta Bystry vojtech.bystry@ceitec.muni.cz Thank you for your attention!