CG020 Genomika Bi7201 Základy genomiky High throughput approaches Systems biology Kamil Růžička Funkční genomika a proteomika rostlin, Mendelovo centrum genomiky a proteomiky rostlin, Středoevropský technologický institut (CEITEC), Masarykova univerzita, Brno kamil.ruzicka@ceitec.muni.cz, www.ceitec.muni.cz  High throughput biology  Automation  Omics  Transcriptomics and high throughput transcriptomics  High throughput interactomics and how to read it  High throughput of anything  1000(+1) genomes, GWAS  ENCODE  Little about Systems biology  Omics  Holism and modules  Gene regulation in E. coli  Negative autoregulatory loops  Robustness of negative autoregulatory networks  Positive autoregulatory networks Přehled Examples of automation in human history blacksmith manufacture assembly line robotic automation High throughput sequencing gene genes genome genomes populations? Automation in transcriptomics qRT-PCR multichannel pipette bigger multichannel pipette pipetting robotmicroarraydeep sequencing Protoplasting/cell sorting http://bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi eFP browser insitu.fluitfly.org Fl(2)D gene in Drosophila embryos emouseatlas.org KIAA1841 in mouse expressed in neurons Genevestigator – check your gene’s transcriptome networks Arabidopsis and also other species for academic users free Database of protein families in plants http://www.phytozome.net/ great for conservation of splicing events etc. Yeast two-hybrid (Y2H) summary protein-protein interaction hunt High throughput yeast two hybrid for various organisms (2000) (2005)(2009) TAP purification affinity purification interaction hunt MALDI-TOF So, far high throughput affinity purification approach slightly less popular thebiogrid.org - highly relevant for searching for interactors, but look also elsewhere! (2011) (2002) Interactors of EMB2016 EMB2016 MTA-A FIP37 HAKAI Geert de Jaeger lab tandem affinity purification use databases if you have a conserved complex EMB2016 interactors – RNA methylase EMB2016 MTA-B FIP37 HAKAI 37 Zhong et al. 2009 MTA-A – homolog of MTA RING finger/HAKAI was also shown to associate with splicing factors (human) All guys back here when using MTA-A as bait (Immunoprecipitation) EMB2016 FIP37 MTA-A/ MTA-B HAKAI Inferred protein complex Inferred protein complex EMB2016 FIP37 MTA-A/ MTA-B HAKAI Flybase: EMB2016 interacts with HAKAI (no data on Biogrid) Assumption mta-a mta-b wild typefip37-1 emb2016 all of them: even very strong knockdowns viable -> MTA-A and MTA-B probably necessary both -> MTA-A and -B probably interact MTA-A and –B yeast homologs interact, FIP37 as well MTA-A MTA-B FIP37 You can order your mutant from the stock center signal.salk.edu the same for Drosophila, mouse, worm etc. You can order your cDNA clone from the stock center signal.salk.edu the same for yeast, Drosophila, mouse etc. You can order your cDNA clone from the stock center signal.salk.edu the same for Drosophila, mouse, worm etc.You need probably to clone this one yourself. You can order your cDNA clone from the stock center signal.salk.edu even basic fusions (GFP, myc, TAP etc.) often ready for you You can order your RNAi/amiRNA • even cloned in binary vector • just google… Commercial service as well. You can order antibodies against your protein Arabidopsis so far lagging – agrisera.com perhaps little bit. Rather commercial service. googling human proteins: http://www.scbt.com/ www.acris-antibodies.com/ etc. - even get western and immunocytochemistry in advance Phenoscope PHENOSCOPE: an automated largescale phenotyping platform Thisne et al. 2013 Phenoscope Phenoscope • leaf area (camera) • photosynthesis (spectra) • weight • temperature (thermo camera) • in a dynamic manner • … • various ecotypes only, so far • commercially promising GrowScreen-Root software Phenoscope – perhaps in future adaptation on other tissues certainly possible Check your phenotype online seedgenes.org – database of plant embryonic mutants (in-dept) http://rarge.psc.riken.jp/phenome/ - RIKEN Arabidopsis Phenome Information Database (kind of attempt on adult plant) 1000 genomes 1000 human genomes over the world wikipedia.org 1001 genomes - Arabidopsis http://1001genomes.org/ in both cases, much more lines already sequenced How the ecotypes are collected Olivier Loudet web page 1001 genomes user interface several single nucleotide polymorphisms (SNP) in the selected gene Tü-WHO (DE) Caucasus Cent. Asia S. Africa Genome wide association studies (GWAS) Slovak et al. 2014, Busch lab, Vienna Genome wide association studies (GWAS) 163 accessions (ecotypes), several replicates (8 x 3) Slovak et al. 2014 searching for those different (say how different they might be!) Genome wide association studies (GWAS) 163 accessions (ecotypes), several replicates (8 x 3) Slovak et al. 2014 searching for those different (e. g. root growth, slim root, resistant to exogenous treatment) Genome wide association studies (GWAS) high p-value => SNP specifically in the “resistant” line (N-way ANOVA etc.) chromosome (locus) In contrast to human: - how to test it? Genome wide association studies (GWAS) =CaS cas-1 mutant has indeed shorter root Slovak et al. 2014 Genome wide association studies (GWAS) Manhattan plot by human The ENCODE project Is really only ~1 % human genome functional? The Encyclopedia of DNA Elements September 2012 1 % = gene coding regions ENCODE – think big • 80 million dollars (1/2 yearly GAČR budget) • 1,640 data sets • 147 cell types • Nature (6), Genome Biology (18), Genome Research (6 papers) The ENCODE project Mainly cancer cells, lymphocytes etc. RNA transcribed regions: RNA-seq, CAGE, RNA-PET and manual annotation Protein-coding regions: mass spectrometry Transcription-factor-binding sites: ChIP-seq, DNase-seq Chromatin structure: DNase-seq, FAIRE-seq, histone ChIP-seq and MNase-seq DNA methylation sites: RRBS assay ENCODE - summary ~80 % genome associated with biochemical function: - enhancers, promoters - transcribed to non-coding RNA - 75 % genome transcribed, at least little bit - number of recognition sequences of DNA binding proteins doubled E. g. 75 % meaningful number? ModENCODE on the way http://www.modencode.org/ Adult eclosion + several days Adult female Adult male Embryos 0-1, 0-2, 0-12, 10-12 hr etc Larvae in various instars Pupae in various stages Mated males or females etc. Drosophila tissue sources: Question: where do you see the limits of high throughput biology? Cons - sometimes low quality data or artifacts - occasionally data missing - biological material is quite complex - what to do with so many data? - where is the idea? • next name for something between biology and chemistry? biochemistry -> proteomics molecular biology -> (functional) genomics • a real new concept? What is systems biology “Multidimensional biology”  Genomics  Epigenomics  Transcriptomics  Epitranscriptomics  Translatomics / Proteomics  Metabolomics  Interactomics  Fluxomics  NeuroElectroDynamics  Phenomics  Biomics Systems theory Forget about reductionism, think holistically. ὅλος [hol'-os] – greek. all, the whole, entire, complete Reductionism vs. holism Ludwig von Bertalanffy (1901-1972) Omics-revolution shifts paradigm to large systems - Integrative bioinformatics - (Network) modeling E. coli genome and proteome is small Reductionism within holism Lets e.g. assume that transcription and translation is one module. E. coli Generation time 20 min Description of gene regulation Description of gene regulation [units [time-1] Description of gene regulation [units [time-1] Description of gene regulation cells grow protein is degraded + [units [time-1] Description of gene regulation Production of Y starts from zero 𝑌𝑡 = ß 𝛼 (1 − 𝑒−𝛼𝑡 ) (imagine Baťa and cvičky) 1. Steady state – ustálený stav t Yst Y 2. Production of Y stops: (log => ln [.CZ]) 2. Production of Y stops: (log => ln [.CZ]) Large 𝛼 → rapid degradation Stable proteins (most of E. coli proteins) τ – cell generation Stable proteins τ – cell generation Response time is one generation. + 3. Production of Y starts from zero 𝑌𝑠𝑡 = ß 𝛼 Y grows almost linearly initially 3. Production of Y starts from zero Response time: The same response time as in case 2. Response time does not depend on production rate! 3. Production of Y starts from zero Response time: Degradation – faster response time. However, energetically demanding. F-box regulatory ubiquitin genes in organism Arabidopsis: 700 Saccharomyces: 14 Drosophila: 24 Human: 38 Arabidopsis does not have problems with energy Great web sites http://www.yeastgenome.org/ http://www.pombase.org/ http://flybase.org/ http://www.wormbase.org/ http://www.arabidopsis.org/ S. cerevisiae S. pombe Drosophila C. elegans A. thaliana Also nice web sites http://encodeproject.org/ http://www.thebiogrid.org/ http://www.genemania.org/ http://string-db.org/ …and many others …pay attention, if they are kept alive and curated  Source literature (systems biology)  http://sybila.fi.muni.cz/cz/index - obor na fakultě informatiky.  http://www.youtube.com/watch?v=Z__BHVFP0Lk and further – excellent talks about systems biology from Uri Alon (Weizman Institute) – absolutely best  Alon U. Network motifs: theory and experimental approaches. Nat Rev Genet. 2007 Jun;8(6):450-61. Review about the same.  Alon, U. (2006). An Introduction to Systems Biology: Design Principles of Biological Circuits (Chapman and Hall/CRC). Literature  For enthusiasts  Venter, J.C. (2008). A life decoded: my genome, my life (London: Penguin).  Albert-László Barabási (2005) V pavučině sítí. (Paseka) (znamenitá kniha o matematice sítí, dynamicky se rozvíjejícím oboru od předního světového vědce)  PA052 Úvod do systémové biologie, Přednášky. Fakulta Informatiky MU  http://www.pnas.org/content/110/29/11952 (paper which challenges something conclusions in ENCODE)