C6215 Advanced Biochemistry and its Methods Lesson 1 Introduction into Genomics Jan Hejátko Functional Genomics and Proteomics of Plants, Mendel Centre for Plant Genomics and Proteomics, Central European Institute of Technology (CEITEC), Masaryk University, Brno hejatko@sci.muni.cz, www.ceitec.muni.cz  Definition Of Genomics  Forward vs Reverse Genetics  Genes Structure and Identification  Nucleic Acid Sequencing  Analysis of Gene Expression Outline  Definition Of Genomics Outline  Sensu lato (in the broad sense) – it is interested in STRUCTURE and FUNCTION of genomes  Sensu stricto (in the narrow sense) – it is interested in FUNCTION of INDIVIDUAL GENES – FUNCTIONAL GENOMICS  It uses mainly the reverse genetics approaches  Necessary prerequisite: knowledge of the genome (sequence) – work with databases GENOMICS – What is it?  Definition Of Genomics  Forward vs Reverse Genetics Outline 3 : 1 Forward („classical“) Genetics Approaches Reverse Genetics Approaches ? Insertional mutagenesis 5‘TTATATATATATATTAAAAAATAAAATAAAA GAACAAAAAAGAAAATAAAATA….3‘ GENOMICS – What is it? The role of BIOINFORMATICS in FUNCTIONAL GENOMICS BIOINFORMATICS FUNCTIONAL GENOMICS  Definition Of Genomics  Forward vs Reverse Genetics  Genes Structure and Identification Outline  Promoter  Transcriptional start  5´UTR  Translational start  Splicing sites  Stop codon  3´UTR  Polyadenylation signal TATA ATG….ATTCATCAT ATTATCTGATATA 5´UTR 3´UTR ….ATAAATAAATGCGA Genes Structure  Omitting 5‘ and 3‘ UTR  Identification of translation start (ATG) and stop codon (TAG, TAA, TGA)  Finding donor (typically GT) and acceptor (AG) splicing sites  Many ORFs are not real coding sequences – in Arabidopsis, there are on average approximately 350 milion ORFs in every 900 bp of sequence(!)  Using various statistic models (e.g. Hidden Markov Model – HMM, see recommended literature, Majoros et al., 2003) to evaluate and score the weight of identified donor and acceptor sites Identification of Genes Ab Initio  Alteration of phenotype after mutagenesis  Forward genetics  Identification of sequence-specific mutant and analysis of its phenotype  Reverse genetics  Analysis of expression of a particular gene and its spatiotemporal specifity  Principles of experimental identification of genes using forward and revers genetics Experimental Gene Identification  Alteration of phenotype after mutagenesis  Forward genetics  Principles of experimental identification of genes using forward and reverse genetics Forward Genetics Identification of CKI1 via Activation Mutagenesis  CKI1 overexpression mimics cytokinin response Kakimoto, Science, 1996 NO hormones tZ ctrl1 ctrl2Plasmid Rescue Pro35S::CK1 Signal Transduction via MSP NUCLEUS CYTOKININ PM AHK sensor histidine kinases • AHK2 • AHK3 • CRE1/AHK4/WOL REGULATION OF TRANSCRIPTION INTERACTION WITH EFFECTOR PROTEINS HPt Proteins • AHP1-6 Response Regulators • ARR1-24  Alteration of phenotype after mutagenesis  Forward genetics  Identification of insertional mutant and analysis of its phenotype  Reverse genetics  Principles of experimental identification of genes using forward and revers genetics Reverse Genetics Identification of insertional cki1 mutant allele CKI1 Regulates Female Gametophyte Development  CKI1 is necessary for proper megagametogenesis in Arabidopsis CKI1/CKI1CKI1/cki1-i Hejátko et al., Mol Genet Genomics (2003) A. ♂ wt x ♀ CKI1/cki1-i B. ♂ CKI1/cki1-i x ♀ wt C. ♂ wt x ♀ CKI1/cki1-i D. ♂ CKI1/cki1-i x ♀ wt CKI1 specific primers (PCR positive control) cki1-i specific primers CKI1 and Megagametogenesis  cki1-i is not transmitted through the female gametophyte FG 0FG 1FG 2FG 3FG 4 CKI1 and Megagametogenesis cki1-iCKI1 late FG5FG6FG7 24 HAE48 HAE CKI1 and Megagametogenesis Hejátko et al., Mol Genet Genomics (2003)  Alteration of phenotype after mutagenesis  Forward genetics  Identification of insertional mutant and analysis of its phenotype  Reverse genetics  Analysis of expression of a particular gene and its spatiotemporal specifity  Principles of experimental identification of genes using forward and reverse genetics Forward and Reverse Genetics CKI1 is Expressed During Megagametogenesis FG0-FG1 FG3-FG4 FG4-FG5 FG7 12 HAP (hours after pollination) 24 HAP48 HAP72 HAP ♀ wt x ♂ ProCKI1:GUS Paternal CKI1 is Expressed in the Arabidopsis Sporophyte Early after Fertilization 24 HAP Hejátko et al., Mol Genet Genomics (2003)  Definition Of Genomics  Forward vs Reverse Genetics  Genes Structure and Identification  Nucleic Acid Sequencing Outline Frederick Sanger 1958 – Nobel prize – insuline structure 1975 - Dideoxy sequencing method 1980 – second Nobel prize for NA sequencing Sanger Sequencing Sanger Sequencing https://youtu.be/3M0PyxFPwkQ NGS Sequencing https://youtu.be/‐7GK1HXwCtE  Definition Of Genomics  Forward vs Reverse Genetics  Genes Structure and Identification  Nucleic Acid Sequencing  Analysis of Gene Expression Outline  Methods of gene expression analysis  Quantitative analysis of gene expression  DNA chips  Next generation transcriptional profiling  Qualitative analysis of gene expression  Preparation of transcriptional fusion of promoter of analysed gene with a reporter gene  Preparation of translational fusion of the coding region of the analysed gene with reporter gene  Use of the data available in public databases  Tissue- and cell-specific gene expression analysis Gene Expression Assays  Methods of gene expression analysis  Quantitative analysis of gene expression  DNA chips Gene Expression Assays  Method, which provides quick comparison of a large number of genes/proteins between the test sample and control  Oligo DNA chips are used the most  There are commercialy available kits for the whole genome  company Operon (Qiagen), 29.110 of 70-mer oligonucleotides representing 26.173 genes coding proteins, 28.964 transcripts and 87 microRNA genes of Arabidopsis thaliana  Possibility of use for the preparation of photolithography chips – facilitation of oligonucletide synthesis e.g. for the whole human genome (about 3,1 x 109 bp) jit is possible to prepare 25-mers in only 100 steps, by this technique Affymetrix ATH1 Arabidopsis genome array  Chips not only for the analysis of gene expression, but also for e.g. Genotyping (SNPs, sequencing with chips, …) DNA Chips  For the correct interpretation of the results, good knowledge of advanced statistical methods is required  Control of accuracy of the measurement (repeated measurements on several chips with the same sample, comparing the same samples analysed on different chips with each other)  It is necessary to include a sufficient number of controls and repeats  Control of reproducibility of measurements (repeated measurements with different samples isolated under the same conditions on the same chip – comparing with each other) Che et al., 2002  Identification of reliable measurement treshold nespolehlivé spolehlivé  Finally comparing the experiment with the control or comparing different conditions with each other -> the result  Currently there‘s been a great number of results of various experiments in publicly accessible databases DNA Chips  Methods of gene expression analysis  Quantitative analysis of gene expression  DNA chips  Next generation transcriptional profiling Gene Expression Assays WT hormonal mutant Next Gen Transcriptional Profiling □ Transcriptional profiling via RNA sequencing mRNA Sequencing by Illumina and number of transcripts determination mRNA cDNA cDNA Results of –omics Studies vs Biologically Relevant Conclusions □ Transcriptional profiling yielded more then 7K differentially regulated genes… gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant AT1G07795 1:2414285-2414967 WT MT OK 0 1,1804 1.79769e+308 1.79769e+ 308 6.88885e-05 0,00039180 1 yes HRS1 1:4556891-4558708 WT MT OK 0 0,696583 1.79769e+308 1.79769e+ 308 6.61994e-06 4.67708e- 05 yes ATMLO14 1:9227472-9232296 WT MT OK 0 0,514609 1.79769e+308 1.79769e+ 308 9.74219e-05 0,00053505 5 yes NRT1.6 1:9400663-9403789 WT MT OK 0 0,877865 1.79769e+308 1.79769e+ 308 3.2692e-08 3.50131e- 07 yes AT1G27570 1:9575425-9582376 WT MT OK 0 2,0829 1.79769e+308 1.79769e+ 308 9.76039e-06 6.647e-05 yes AT1G60095 1:22159735- 22162419 WT MT OK 0 0,688588 1.79769e+308 1.79769e+ 308 9.95901e-08 9.84992e- 07 yes AT1G03020 1:698206-698515 WT MT OK 0 1,78859 1.79769e+308 1.79769e+ 308 0,00913915 0,0277958 yes AT1G13609 1:4662720-4663471 WT MT OK 0 3,55814 1.79769e+308 1.79769e+ 308 0,00021683 0,00108079 yes AT1G21550 1:7553100-7553876 WT MT OK 0 0,562868 1.79769e+308 1.79769e+ 308 0,00115582 0,00471497 yes AT1G22120 1:7806308-7809632 WT MT OK 0 0,617354 1.79769e+308 1.79769e+ 308 2.48392e-06 1.91089e- 05 yes AT1G31370 1:11238297- 11239363 WT MT OK 0 1,46254 1.79769e+308 1.79769e+ 308 4.83523e-05 0,00028514 3 yes APUM10 1:13253397- 13255570 WT MT OK 0 0,581031 1.79769e+308 1.79769e+ 308 7.87855e-06 5.46603e- 05 yes AT1G48700 1:18010728- 18012871 WT MT OK 0 0,556525 1.79769e+308 1.79769e+ 308 6.53917e-05 0,00037473 6 yes AT1G59077 1:21746209- 21833195 WT MT OK 0 138,886 1.79769e+308 1.79769e+ 308 0,00122789 0,00496816 yes AT1G60050 1:22121549- 22123702 WT MT OK 0 0,370087 1.79769e+308 1.79769e+ 308 0,00117953 0,0048001 yes Ddii et al., unpublished AT4G15242 4:8705786-8706997 WT MT OK 0,00930712 17,9056 10,9098 -4,40523 1.05673e-05 7.13983e-05 yes AT5G33251 5:12499071- 12500433 WT MT OK 0,0498375 52,2837 10,0349 -9,8119 0 0 yes AT4G12520 4:7421055-7421738 WT MT OK 0,0195111 15,8516 9,66612 -3,90043 9.60217e-05 0,000528904 yes AT1G60020 1:22100651- 22105276 WT MT OK 0,0118377 7,18823 9,24611 -7,50382 6.19504e-14 1.4988e-12 yes AT5G15360 5:4987235-4989182 WT MT OK 0,0988273 56,4834 9,1587 -10,4392 0 0 yes  Methods of gene expression analysis  Quantitative analysis of gene expression  DNA chips  Next generation transcriptional profiling  Qualitative analysis of gene expression  Preparation of transcriptional fusion of promoter of analysed gene with a reporter gene Gene Expression Assays  Identification and cloning of the promoter region of the gene  Preparation of recombinant DNA carrying the promoter and the reporter gene (uidA, GFP) TATA box Iniciation of transcription promoter 5’ UTR ATG…ORF of reporter gene Transcriptional Fusion  Identification and cloning of the promoter region of the gene  Preparation of recombinant DNA carrying the promoter and the reporter gene (uidA, GFP)  Preparation of transgenic organisms carrying this recombinant DNA and their histological analysis Transcriptional Fusion GUS Reporter in Mouse Embryos  Methods of gene expression analysis  Quantitative analysis of gene expression  DNA chips  Next generation transcriptional profiling  Qualitative analysis of gene expression  Preparation of transcriptional fusion of promoter of analysed gene with a reporter gene  Preparation of translational fusion of the coding region of the analysed gene with reporter gene Gene Expression Assays  Identification and cloning of the promoter and coding region of the analyzed gene  Preparation of a recombinant DNA carrying the promoter and the coding sequence of the studied gene in a fusion with the reporter gene (uidA, GFP) TATA box promoter 5’ UTR ATG…ORF of analysed gene…..….ATG…ORF of reporter gene….….....STOP Translational Fusion  Preparation of transgenic organisms carrying the recombinant DNA and their histological analysis  Compared to transcriptional fusion, translation fusion allows analysis of intercellular localization of gene product (protein) or its dynamics Histone 2A-GFP in Drosophila embryo by PAMPIN1-GFP in Arabidopsis Translational Fusion Translational Fusion  Methods of gene expression analysis  Quantitative analysis of gene expression  DNA chips  Next generation transcriptional profiling  Qualitative analysis of gene expression  Preparation of transcriptional fusion of promoter of analysed gene with a reporter gene  Preparation of translational fusion of the coding region of the analysed gene with reporter gene  Tissue- and cell-specific gene expression analysis Gene Expression Assays Fluorescence-Activated Cell Sorting (FACS) □ High-Resolution Expression Map in Arabidopsis Root Expression Maps - RNA Brady et al., Science, 2007 BAR ePlant https://bar.utoronto.ca/eplant/ □ High-Resolution Expression Map in Drosophilla Expression Maps - RNA Nikos Karaiskos et al. Science 2017;science.aan3235 Drosophila Virtual Expression eXplorer https://shiny.mdc‐berlin.de/DVEX/ Expression Maps - Proteins Ponten et al., J Int Med, 2011 □ Human Protein Atlas □ Human Protein Atlas (http://www.proteinatlas.org/) Expression Maps - Proteins □ Human Protein Atlas (http://www.proteinatlas.org/) Expression Maps - Proteins  Definition Of Genomics  Forward vs Reverse Genetics  Genes Structure and Identification  Nucleic Acid Sequencing  Analysis of Gene Expression Summary Discussion