Vendula Pospíchalová, PhD (pospich@sci.muni.cz) Department of Experimental Biology Animal Physiology and Immunology Bi5599 Applied Biochemistry and Cell Biology Methods 2019-10-30 Omics technologies: genomics, transcriptomics, metabolomics, databases and big data Schematic representation of omics technologies, their corresponding analysis targets, and assessment methods. Taken from Wu RD et al. JDR 2011; 90:561-572. Contents 1. Introduction: what are –omics technologies + history 2. How does big data look and how to approach it 3. From –omics technologies to biomarkers and personalized medicine 4. Genomics: genomes vs exomes vs genotypes + DTC service 5. Cancer databases: COSMIC, TCGA, Oncomine and others 6. Transcriptomics: microarrays vs. RNA sequencing 7. Gene set analysis 8. Metabolomics 9. Cutting edge: single cell –omics and single cell multi-omics 10. Summary and take home messages What are „-omics“ technologies • Omics refers to a field of study in biology ending in -omics, such as genomics, proteomics or metabolomics • The related suffix -ome is used to address the objects of study of such fields, such as the genome, proteome or metabolome • -ome = many/collectivity in Latin, -omics = study of large sets of biomolecules • High-throughput experimental technologies characterized by automation, miniaturized assays and large-scale data analysis • Analytic part of the experiment is usually much longer than the experiment itself – bioinformatics skills needed • Raw data is the „gem“ but usually is in user unfriendly format • Interpreting functional consequences of millions of discovered events is one of the biggest challenges Big –omics data challenges https://www.laboratory-journal.com/science/information- technology-it/big-data-genomics-challenges-and-solutions EASY RELATIVELY EASY HARD HARDEST Only skilled bioinformaticians can proces raw data Integrated analyses of –omics studies are only possible in large consortia (e.g. TCGA). Subsequently, the authorlist of such articles can more than 2 pages long with substaintial part of the authors being bioinformaticians. Among the reviewers, bioinformaticians are also necessary … Data sharing policy • The concepts of data sharing and open data are becoming increasingly important in science • Funding bodies, journals and societies are now encouraging or mandating data sharing (usually the raw data) • Sharing data publicly is an important way of improving reproducibility and showing that researchers are confident in their work • Studies with raw data shared in a repository also receive more citations than those without publicly available data • But raw -omics data are hard to analyse, so many platforms gather the publicly available data, thoroughly analyze it, curate it and share it in a user friendly format Leveraging Public Databases to Identify Actionable Targets DIKW pyramide Jennifer RowleyPublished 2007 in J. Information Science DOI:10.1177/0165551506070706 „Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom.“ – Clifford Stoll What is the aim of OMICS technologies http://www.jpathinformatics.org/text.asp?2015/6/1/46/163985 The DIKW pyramid metaphor: “know-nothing” (Data) ↓ “know-what” (Information) ↓ “know-how” (Knowledge) ↓ “Know-why” (Wisdom) Zeleny (2005) In Genomics: Sequence of 3 billions letters in .txt file ↓ information where the individual's genome varies from reference sequence ↓ CYP2C9 or TPMT genotype, which has known pharmacogenomic associations ↓ individualize the dose of a new warfarin prescription What is personalized health care? https://pharma.bayer.com/en/research-and-development/research- focus/oncology/personalized-medicine/index.php Value of personalized medicine And many more examples, see http://www.personalized medicinecoalition.org for more detailed information on PM https://invivo.pharmaintellig ence.informa.com/ What is a biomarker? -OMICS technologies and their integration is crucial for biomarker discovery and validation History of „-omics“ technologies • Genome – central part of all –omics technologies • NGS = next generation sequencing © slideshare.net Sanger vs next generation sequencing • Sanger sequencing • https://www.youtube.com/watch?v=e2G5zx-OJIw • Next generation sequencing (Illumina is shown as an example) • https://www.youtube.com/watch?v=9YxExTSwgPM https://slideplayer.com/slide/5799907/ (1.5%) Used to confirm NGS results Seeing is believing • www.illumina.com/t echnology/next- generation- sequencing.html Illumina NGS overview Cost of sequencing over time © slideshare.net DTC (direct-to-customer) genetic testing http://ancestry.com/ Genotyping vs Sequencing • Genotyping - determining which genetic variants an individual possesses through a variety of different methods, especially genotyping chips (based mostly on SNPs – single nucleotide polymorphisms) - cheap, but require prior identification of the variants of interest https://www.23andme.com/ How SNP genotyping works • https://www.youtube.com/watch?v=Naona1y_I2U • For more information see YouTube Channel Useful Genetics: https://www.youtube.com/channel/UCtXCrx28msMBQ-vFUIOIReA https://www.jax.org/news-and-insights/jax-blog/2016/september/genomes-versus-exomes-versus-genotypes How SNP genotyping works There are two types of microarray commonly used in multiplexing SNP analysis: allele-specific oligonucleotide (ASO) hybridization and allele-specific primer (ASP) extension. (A) ASO hybridization: The allele-specific oligonucleotide for every SNP is synthesized and separately immobilized onto the glass plate. Fluorescence labeled targets containing SNP sites are produced from a PCR reaction and plotted separately into each well to conduct the hybridization reaction. The mismatched base pair between target and oligonucleotide can decrease the binding strength with the fluorescence-labeled target removed after a stringent washing. A fluorescence signal is detected on a perfectly matched base pair; (B) Allele-specific primer (ASP) extension: The specific primer for SNP location is designed and separately immobilized onto a microarray. A different fluorescence labeled dNTP is individually used in an extension reaction. The extended fragment showing fluorescence signal can only be found when the 3′ end of primer pair is perfectly matched (AA type in this case) in contrast to the mismatched primer pair (GG type in this case); (C) The SNP genotype can be determined according to fluorescent intensity from the products/target DNA. https://doi.org/10.3390/microarrays4040570 DTC genome sequencing as popular demand https://www.dantelabs.com Coverage (or depth) in sequencing Sequencing – WGS and WES ~3,000,000,000,000 https://www.mygenefood.com/finding-best-dna-test-genotype-sequence/ • Determining the exact DNA sequence Genomes vs exomes vs genotypes https://2wordspm.wordpress.com/2017/10/30/ngs-%EA%B2%80%EC%82%AC-whole-genome-exome-targeted-sequencing-%EB%B9%84%EA%B5%90/ • Most sensitive – able to detect rare tumor cells in biopsy • Results are sometimes challenging to interpret • Good alternative to WGS in terms of clinical use What to expect • Genetic testing provided by most of the companies is moreless for fun (ancestry, health and wellness, nutrigenetics, skincare, sports,…) • More expensive, and complete, sequencing like the one provided by Illumina can be used for medical investigation • Do not expect your genome sequencing to tell you how long is your life expectation, whether you are likely to get cancer and so on • So far our knowledge on the “implication” of the genome are quite limited • What we can already do in health care is to look at the genome once you have been diagnosed a specific ailment and look for specific genes that would make one cure more effective than another (this has become normal practice in some form of cancer cure) Based on http://sites.ieee.org/futuredirections/2017/12/26/did-you-get-your-genome-sequenced-for-christmas/ Example of genetic testing in clinical practise • BRCA genes testing for PARP inhibitor treatment • https://www.youtube.com/watch?v=ilwMGRH276M PARP inhibitors • In December 2014, the drug olaparib (Lynparza) became the first of a new class of treatments known as PARP (poly(ADP-ribosa)polymerase) inhibitors to be licensed for clinical use, heralding in a new era for personalised, targeted treatment — and turning the promise of ‘synthetic lethality’ into reality. More info on PARPi: https://www.youtube.com/watch?v=mgW30YyaJz4 https://doi.org/10.1016/j.ygyno.2015.02.017 Synthetic lethality concept BER = base excision repair BRCAness ovarian tumors and PARPi • Only women with mutation in BRCA genes are now eligible for PARPi treatment • BRCA mutations underlie only a small portion of tumors defective in HR • But 50% of ovarian tumors are HR deficient = ‚BRCAness‘ phenotype • PARPi are effective also in BRCAness tumors • Can we identify the BRCAness tumors/patients and provide them also with this novel and highly promissing treatment option? DOI: 10.1158/2159-8290.CD-15-0714 Homologous Recombination Deficiency: Exploiting the Fundamental Vulnerability of Ovarian Cancer The Present and Future of Genome Sequencing • Genomics England - 100,000 patients with rare diseases, their families, and cancer patients • Precision Medicine Initiative (PMI) 1-million-volunteer health study, data including genetics and lifestyle factors • GenomeAsia 100K - genomic data for Asian populations • … a many more initiatives https://labiotech.eu/features/genome-sequencing-review-projects/ • How to handle such huge amount of data and the ethical implications? • In the US, the Genetic Information Nondiscrimination Act (2008) but mostly no act in other countries and somewhat grey legal position in Europe COSMIC: Cataloque of Somatic Mutations in Cancer https://cancer.sanger.ac.uk/cosmic/about • How to use COSMIC database: https://www.youtube.com/watch?v=2FD5RabgK6o, https://www.youtube.com/watch?v=k477uAiKx74 TCGA: The Cancer Genome Atlas https://cancergenome.nih.gov/ https://www. youtube.com/ watch?time_c ontinue=249 &v=epsZjJ_A1 y4 TCGA: Overview • Initiated in 2005 • A joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). • 27 participating Institutes in US and Canada. • The overarching goal of TCGA is to improve our ability to diagnose, treat and prevent cancer, through the application of genome analysis technologies, including large-scale genome sequencing. • The Cancer Genome Atlas Network have published more than 20 papers since the project began (https://tcga-data.nci.nih.gov/docs/publications/) TCGA Data Portal TCGA: A Valuable Resource for Research Community TCGA Data Types • Clinical data • DNA sequencing • miRNA sequencing • Protein expression • mRNA sequencing • Total RNA sequencing • Array-based expression • DNA methylation • Copy number variations + Computational tools How to use TCGA: https://www.youtube.com/playlist?list=PL-hYJ1isbXhURdasc-RmwDRLhrHzdzKtN Oncomine database www.Oncomine.org How to use Oncomine: https://www.youtube.com/wat ch?v=b8ckDiVNrFE Transcriptomics • Study of transcriptome, the sum of all RNA transcripts • Two most widely studies types of RNA • mRNA - transcriptome or the expressed genes. Usually contains genes with poly A tail. • miRNA - Small non-coding RNA (containing about 21- 25 nucleotides), important in gene regulation. • Array-based Expression Profiling: • https://www.youtube.com/watch?v=6ZzFihESjp0 Microarrays vs RNA-seq • While methods for analyzing microarray data are fully mature and straightforward, there is no consensus on which pipelines—or series of computational steps—to use to analyze RNA-seq data. https://www.the-scientist.com/lab-tools/an-array-of-options-35381 Overview of RNA-seq By Malachi Griffith, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004393, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=53055894 RNA sequencing downstream analysis • https://www.youtube.com/watch?v=tlf6wYJrwKY (from 13:10) • More info about microarray vs. RNA-seq at: https://www.youtube.com/watch?v=2c3t3tDEmsU • More info RNA seq at: • https://www.youtube.com/watch?v=MFRkwXq6v_I • Useful detailed info about anything connected toRNA-seq • https://www.rna-seqblog.com Examples of transcriptomics data outputs • Volcano Plot • Heat map Cellular/functional/pathway analysis Usually hundreds to thousands of genes • Useful info in Czech language: https://portal.matematickabiologie.cz • Cellular/functional/pathway analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets. • Genes are aggregated into gene sets on the basis of shared biological or functional properties as defined by a reference knowledge base. • Knowledge bases are database collections of molecular knowledge which may include molecular interactions, regulation, molecular product(s) and even phenotype associations. Database resources for understanding high-level functions and utilities of the biological system • Database tools: • KEGG (Kyoto Encyclopedia of Genes and Genomes) • (https://www.geno me.jp/kegg/) • Disadvantage – does not provide statistical significance of particular pathways • And many others available online KEGG analysis • Gene Ontology (GO) analysis (http://geneontology.org/) Gene-set analysis (GSA)/Pathway analysis • GO enrichment analysis • One of the main uses of the GO is to perform enrichment analysis on gene sets. For example, given a set of genes that are up-regulated under certain conditions, an enrichment analysis will find which GO terms are over-represented (or underrepresented) using annotations for that gene set. • 3 main GO aspects (molecular function, biological process, cellular component) • http://geneontology.org/docs/go- enrichment-analysis/ GO analysis Example data of GO enrichment analysis Reactome Knowledgebase • More info at: • https://www. youtube.com /user/Reacto me/videos Metabolomics https://polypdx.com/for-healthcare- providers/metabolomics • Metabolomics – large-scale systematic study of the metabolome • Metabolome - total complement of metabolites present in a biological sample under given genetic, nutritional or environmental conditions - the unique biochemical fingerprint of all cellular processes • Metabolite - low molecular (usually 50 – 1,500 Da) weight organic compound, typically involved in a biological process as a substrate or product. • Metabolomics yield many insights into basic biological research in areas such as systems biology, metabolic modelling, pharmaceutical research, nutrition and toxicology Metabolites are important Metabolomics can therefore be seen as bridging the gap between genotype and phenotype Metabolomics technologies Metabolomics – ‚a snapshot‘ in time • Metabolomics: employs complementary analytical methodologies, for example, LC-MS/MS, GC-MS, and/or NMR, in order to determine and quantify as many metabolites as possible, either identified or unknown compounds. • Metabolic fingerprinting: a metabolic “signature” or mass profile of the sample of interest is generated and then compared in a large sample population to screen for differences between the samples. When signals that can significantly discriminate between samples are detected, the metabolites are identified and the biological relevance of that compound can be elucidated, greatly reducing the analysis time. Conceptual approaches in metabolomics: • Target analysis: has been applied for many decades and includes the determination and quantification of a small set of known metabolites (targets) using one particular analytical technique of best performance for the compounds of interest. • Metabolite profiling: aims at the analysis of a larger set of compounds, both identified and unknown with respect to their chemical nature. This approach has been applied for many different biological systems using GC-MS, including plants, microbes, urine, and plasma samples. Metabolomics data analysis Where to look for metabolomics data https://www.slideshare.net/TNAUgenomics/metabolomics-13725538?next_slideshow=1 Cutting edge: Single cell -omics • Application of whole genome, whole transcriptome sequencing and other –omics methods to single cells, sc RNA-seq is now the top method https://community.10xgenomics.com/t5/10x-Blog/Single-Cell-RNA-Seq-An-Introductory-Overview-and-Tools-for/ba-p/547 sc RNA-seq data visualization Molecular Architecture of the Mouse Nervous System DOI:https://doi.org/10.1016/j.cell.2018.06.021 Cca 500,000 analyzed cells Common applications of sc RNA-seq https://f1000research.com/articles/5-182/v1 For more info go at: https://omicstools.com Single-cell multi-omics https://www.frontiersin.org/articles/10.3389/fcell.2018.00028/full Challenges: • There are no commercial kits available yet for any single-cell multi-omics techniques, and many are technically challenging. • Researchers must modify existing single-cell protocols so that they’re compatible with multiple types of molecules and take great care to minimize the loss or contamination of samples https://www.the-scientist.com/lab- tools/integrating-multiple--omics-in- individual-cells-64829 Summary • Omics technologies are revolutionizing science and medicine • From data to actionable knowledge Integrated Omics data • Precision medicine is the ultimate goal of many –omics efforts • Despite the progress made we have still a long way to go … 8 • Omics technologies - „the data deluge“ • Genomics and Transcriptomics rely on two main approaches: microarrays (hybridization) and NGS (sequencing by synthesis) • Proteomics and Metabolomics rely heavily on mass spectrometry Take home messages • We have been generating Big data, but we hardly understand it  • Big data is publicly available, go through the databases before you even start even planing your experiment – it can save you enourmous time and money • Databases contain huge datasets of patients you would never be able to gather by yourself, test your hypothesis in silico before the „wet-lab“ work • If you cannot find the „yes/no“ or „a few genes“ answer, use the Cellular/functional/pathway analyses to help you out  • Learning bioinformatics skills (e.g. programing in R) is a good investment plan for your future career • Jay Flatley, Executive Chairman of Illumina: • „Everyone is going to get sequenced, it is gonna be part of their health record and it will be used to manage their health care throughout their lifetime“. Thank you for your attention Any Questions?