CG020 Genomika BI7201 Základy genomiky gh throughput approach Systems biology Kamil Růžička Funkční genomikaa proteomika rostlin, Mendelovo centrum genomiky a proteomiky rostlin, Středoevropskýtechnologickýinstitut (CEITEC): Masarykova univerzita, Brno kamil.ruzicka@ceitec.muni.cz: www.ceitec.muni.cz Přehled ■ High throughput biology ■ Automation ■ Omics ■ Transcriptomics and high throughputtranscriptomics ■ High throughput interactomics and how to read it ■ High throughputof anything ■ 1000(+1) genomes, GWAS ■ ENCODE ■ Epigenenome and epitranscriptome ■ Little about Systems biology ■ Omics ■ Holism and modules ■ Gene regulation in E. coli 3 Examples of automation in human history g a t c Automation in transcriptomics transcriptome sequencing mioroarray pipetting robot FI(2)D gene in Drosophila embryos insitu.fluitfly.org KIAA1841 in mouse expressed in neurons emouseatlas.org Protoplasting/cell sorting Yeast two-hybrid (Y2H) summary protein-protein interaction hunt Transcription High throughput yeast two hybrid for various organisms (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Peter IMz , LolcGlol':, Gerard Cagneyi, Tract A. MansfieldRichard S. Judaon:, Jamas R. KnlgMI, Daniel Locksnon", Vaibhav Narayan .Malthreyan Srlntvasan , Pascals Pochart , Alia Qureshl-Emlll ,Yln9 U , Brian Godwin , Diana Conwor , Theodore KalbfMsch , Govlndan lAJayadamodar , Meijla Tang , Mark Johnston , Stanley Fields^ & Jonathan M. Rothberg: A Protein Interaction Map of Drosophila melanogaster L Giot,1* J. S. Bader.'*!" C. Brouwer,1* A. Chaudhuri," B. Kuang,' Y. Li,1 Y. L. Hao,1 C. E. Ooi.1 B. Godwin.1 E. Vitols.' C. Vijayadamodar,1 P. Pochart,1 H. Machineni,1 M. Welsh,1 Y. Kong.1 B. Zerhusen,1 R. Malcolm,' Z. Varrone,' A. Collis,1 M. Minto.1 S. Burgess,1 L. McDaniel,1 E. Stimpson,1 F. Spriggs,1 J. Williams,1 K. Neurath,1 N. loime,1 M. Agee.1 E. Voss,1 Furtak,1 R. Renzulli,1 N. Aanensen,' S. Carrolla,1 Evidence for Network Evolution ^^^^^'^^ in an Arabidopsis Interactome Map Arobidopsh Interactome Mapping Consortium't (2009) (2005) TAP purification affinity purification interaction hunt Ca-otx*i Wong tev protaua Ongi Ka ^ jj""** O.OOa Oo 0 o. o o ° IM—M>pi|Mim Nature Rf*t*w* | Molecular Ceil Biolor, , 4 MALDI-TOF So, far high throughput affinity purification approach slightly less popular Functional organization of the yeast (2002) proteome by systematic analysis of protein complexes A Protein Complex Network of Drosophila melanogaster (2011) K.G. Guru hare ha.'-* Jean-Fran<;ois Filial,1* Bo Zhai,11 Julian Mintseris,11 Pujita Vaidya,1 Namrta Vaidya,1 Chapman Bookman.1 Christina Wong,1 David Y. Bhcc,1 Odtso Cenaj,' Emily McKillip.1 Saumini Shah,1 Mark Staplelon,2 Kenneth H. Wan.2 Charles Yu.2 Bayan Parsa.2 Joseph W. Carlson,2 Xiao Chen.2 Bhaveon Kapadia.2 K. Vijayflaghavan.3 Steven P. Gygi,1 Susan E. Celniker.2 Robert A. Obar.'-' and Spyros Artavanis-Tsakonas1'" thebiogrid.org - highly relevant for searching for interactors, but look also elsewhere! Interactors of EMB2016 use databases if you have a conserved complex EMB2016 MTA-A FIP37 HAKAI tandem affinity purification Geert de Jaeger lab EMB2016 interactors- RNA methylase RING finger/HAKAI was also shown to associate with splicing factors (human) AtFIP37 empty vector MTA+ FIPJ7^ t ° MTA ™ I empty vector a Zhong et al. 2009 MTA-A - homolog of MTA 16 All guys back here when using MTA-A as bait MTA-a(AMg10760) At5g21326 .At1g32360 L.At3g13470 g76010 .F1P37 («3gS4170) I ■ .MTA-b(A14g09980) ^•At3g44110 \.HAKAI (AI5901160) . EMB2016 (At3g05680) roo2(MIA-.VCol-0} (Immunoprecipitation) 17 Inferred protein complex 18 Inferred protein complex EMB2015 /FIP3wa=r Flybase: EMB2016 interacts with HAKAI (no data on Biogrid) experimental Knowledge D: eiperi mental Knowledge o; experimental Knowledge B {Guruharsha el at 2011) {Guruharsha el at, 20ff) (Guroharaha e(a/., 20)1) T-DNA insertion at random locations in the genome /{LB] I Selection | YFG ] RB T-DNA plasmid Examples of possible insertions: Gene a ]—^ene b|—| Gene c |- You can order your mutant from the stock center flrsbidopsis thaliana [TfllB V10] CM 1661408 - 1671408 111 ----------- . (- 4 1 <- ma swells sa 30133.» 3013 »□□□ □ LOCMB ■ ftTioeoioo :i.....-a......a .* SOU •* *" #OOEnQ SNPJtogJH * *BQGr,Q SNP_Ped-0O 8 Caucasus__ *HHBnQ SNP_Del-10B » Cent. Asia „___ SNP WaltiaesWD S. Africa Tu-WHO (DE) *"BBm[J Pi«etiCodingGeneM I [ \ 1 1 C c c T 1 1 Select Tracks Clear highlighhng several single nucleotide polymorphisms (SNP) in the selected gene What could be natural variation good for? What could be natural variation good for? Quantitative trait loci (QTL) - nature makes genetic screen for you - QTL is analogous to gene in genetic screen Genome wide association studies (GWAS) Scanner 1 Scanner 2 Scannet 3 Scar B B Ho* ? B B e B B B B B B •■11»! Slovak et al. 2014, Busch lab, Vienna Genome wide association studies (GWAS) Trail No. Trait 1 Total length 2 Euclidian length 163 accessions (ecotypes), 3 Root tortuosity 4 Root growth rate 5 Relative root growth rate 6 Root angle several replicates (8 x 3) I 7 Root direction index 8 Root horizontal Index 9 Root vertical Index \ searching for those different 10 Root linearity 11 Average root width 12 Root width 20 13 Root width AO 14 Root width 60 15 Root width 80 (say how different they might be!) 16 Root width 100 Slovak etal. 2014 Genome wide association studies (GWAS) Trail No. Trait 1 Total length 2 Euclidian length 163 accessions (ecotypes), 3 Root tortuosity 4 Root growth rate 5 Relative root growth rate 6 Root angle several replicates (8 x 3) I 7 Root direction index 8 Root horizontal Index 9 Root vertical Index \ searching for those different II Average root width (e. g. root growth, slim root, 13 Root width AO 14 Root width 60 15 Root width 80 16 Root width 100 resistant to exogenous treatment) Slovak etal. 2014 Genome wide association studies (GWAS) high p-value => SNP specifically in the "resistant" line (/V-way ANOVA etc.) ™ 00 1 AT5G 77MO0O 774O0O0 7743000 1 1 1 1 1 i i AT5G23060 AT5G23065 ■II III 1 In contrast to human: ; - how to test it? L li Relative root growth rate (day 2 - 3) i Jjüjüfc At ii !■ .i /■.äiNÄitA.ii • h i 113 4 chromosome (locus) 1 Genome wide association studies (GWAS) N Root growth rate (agar plates) 1 -j- at5g23060 =CaS e , |" • ^ o cas-1 wt cas-7 mutant has indeed shorter root Slovak etal. 2014 Genome wide association studies (GWAS) Manhattan plot by human Status of cytosine methylations in various tissues can be explored in various tissues (human) Epigenetic modifications How to find methylated bases in genome? Which bases are methylated? How to sequence methylation of genome? ^ bisulfite ^ acgactacgc ~ acgautacgu I sequencing | ■ a©ga©ta©g© a©ga©ta©g© ACGACTACGC NH2 NH2 JT CX 3 Xa bisulfite sequencing (f N o N O N O cytosine 5-methylcytosine uracil What is methylation of cytosine good for? Are there other covalent modifications? >100 base modification detected in nucleic acids, incl. RNA N6-methyl adenosine most common in mRNA (0.5 - 5 % adenosines methylated) MeRIP - detecting adenine methylation on RNA 1 Nature Reviews | Molecular Cell Biology RNA-seq nature The ENCODE project The Encyclopedia of DNA Elements Is really only ~1 % human genome functional? 1 % = gene coding regions _September 2012 ENCODE - think big 80 million dollars (1/2 yearly GACR budget) 1,640 data sets 147 cell types Nature (6), Genome Biology (18), Genome Research (6 papers) The ENCODE project Mainly cancer cells, lymphocytes etc. RNA transcribed regions: RNA-seq, CAGE, RNA-PET and manual annotation Protein-coding regions: mass spectrometry Transcription-factor-binding sites: ChlP-seq, DNase-seq Chromatin structure: DNase-seq, FAIRE-seq, histone ChlP-seq and MNase-seq DNA methylation sites: RRBS assay (cheaper version of bisulfite seq) ENCODE - summary -80 % genome associated with biochemical function: - enhancers, promoters - transcribed to non-coding RNA - 75 % genome transcribed, at least little bit - number of recognition sequences of DNA binding proteins doubled E. g. 75 % meaningful number? ,ENCODE L- ModENCODE on the way e3 fO proteomics molecular biology -> (functional) genomics • a real new concept? "Multidimensional biology" o Genomics o Epigenomics o Transcriptomics o Epitranscriptomics o Translatomics / Proteomics o Metabolomics j 'ntPrartnmir. o Fluxomics o NeuroElectroDynamics o Phenomics o Biomics Systems theory Forget about reductionism, think holistically. öXoq [hol'-os] - greek, all, the whole, entire, complete Reductionism vs. holism Components view 1 Time-dependent concentration Compute flux for function Systems view Needed homeostasis rReact netw Steady state flux map Ludwig von Bertalanffy (1901-1972) GENERAL SYSTEM THEORY Galtier ed here are Ludwig von Beriauntfy's writings on general system theory, selected and edited to show the evolution of systems theory and to present Its applications to problem solving. An attempl to lormulale common laws Out apply to virtually every sctenMic lieu this concep-tual approach has had a protound impact on such widely diverse disciplines as biology, economics, psychology, and demography A German-Canadian biologist and philosopher. Ludwig von Bertaiantfy (1901-1972) was the creator and chiel exponent ol general system Ihe-ory He is the author ol ten books including Robots. Men, ana Minds and Modem finales ol Development both which have been published lie horn George Brainier, Inc. s View ol the World 0636-7. ot> St 95 The Releviiice ol Genera/ Systems Theory ISBN MC7M6JM. W SSSS Menrchy theory GENERAL SYSTEM THEORY Open Syurnii in Biolcjg} Open Syutrm ar*l Cybcrntlki 62 Omics-revolution shifts paradigm to large systems High Throughput Data rTJTjÄs! mm ff ■ Cellular Complexity ■ • Integrative bioinformatics ■ (Network) modeling E. coli genome and proteome is small Interactions Components >25,000 2,000 Metabolites 1,000 Proteins / ,( Metabolism /iBbMc&iMz/ Biochemical hformatlon reactions / processing \^ / * Structure >100,000 6-10,000 j'\ /^Kdň^jiKlM s,,ess ájL^&JŠRB& A.1*// Proteome Other functions >5,000 5,000 ÍV- \ fW^iii T""sla"°" Jffi^wZf^y Transcrip,onie \ iii'iiiliiWii'iW T""scr""*" 0 4,000 ^Genesj Genome 64 Reductionism within holism Lets e.g. assume that transcription and translation is one module. Conclusions computing capacities allow handling large data sets fashionable modelling whole cell processes in silico? story frequently missing, there will be always question marks Great web sites http http http http http //www.yeastgenome.org/ S. cerevisiae //www.pombase.org/ S. pombe //flybase.org/ Drosophila //www.wormbase.org/ C. elegans //www.arabidopsis.org/ A. thaliana 67 Also nice web sites http://encodeproiect.org/ http://www.thebioqrid.org/ http://www.qenemania.org/ http://string-db.org/ ...and many others ...pay attention, if they are kept alive and curated Additional literature Venter, J.C. (2008). A life decoded: my genome, my life (London: Penguin). Albert-László Barabási (2005) V pavučině sítí. (Paseka) (znamenitá kniha o matematice síti, dynamicky se rozvíjejícím oboru od předního světového vědce) PA052 Úvod do systémové biologie, Přednášky. Fakulta Informatiky MU http:/y^Tvw.youtube.coni/watcb?\—Z BHVFPOLk and further — excellent talks about systems biology from Uri Alon (Weizman Institute) — absolutely best http://www.pnas.orq/content/110/29/11952 (paper which challenges something conclusions in ENCODE)