CG020 Genomika Bi7201 Základy genomiky High throughput approaches Systems biology Kamil Růžička Funkční genomika a proteomika rostlin, Mendelovo centrum genomiky a proteomiky rostlin, Středoevropský technologický institut (CEITEC), Masarykova univerzita, Brno kamil.ruzicka@ceitec.muni.cz, www.ceitec.muni.cz  High throughput biology  Automation  High throughput of anything  1000(+1) genomes, natural variation, GWAS  Epigenome and epitranscriptome  ENCODE  Little about Systems biology  Omics  Holism and modules Přehled Pavel Kantorek - RIP Examples of automation in human history blacksmith manufacture assembly line robotic automation Automation in transcriptomics qRT-PCR multichannel pipette bigger multichannel pipette pipetting robotmicroarraytranscriptome sequencing High throughput sequencing gene genes genome genomes ecosystems transcriptomes epigenomes etc. insitu.fluitfly.org, emouseatlas.org KIAA1841 in mouse expressed in neurons In situ gene expression atlases find expression pattern of your gene Fl(2)D in Drosophila Light sheet microscopy – high throughput Tomancak lab, MPI Dresden …perhaps video will work Protoplasting/cell sorting http://bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi eFP browser Cao et al. Science 2017 Single cell transcriptomics takes over Yeast two-hybrid (Y2H) summary protein-protein interaction hunt High throughput yeast two hybrid for various organisms (2000) (2005)(2009) TAP purification affinity purification interaction hunt MALDI-TOF High throughput affinity purification thebiogrid.org – very nice tool (2011) (2002) Interactors of EMB2016 EMB2016 MTA-A FIP37 HAKAI Geert de Jaeger lab tandem affinity purification use databases if you have a conserved complex EMB2016 interactors – RNA methylase EMB2016 MTB FIP37 HAKAI 37 Zhong et al. 2009 MTA-A – homolog of MTA RING finger/HAKAI was also shown to associate with splicing factors (human) EMB2016 FIP37 MTA/ MTB HAKAI Do other proteins specifically bind? Advantage of conserved proteins in high throughput data EMB2016 FIP37 MTA /MTB HAKAI Flybase: EMB2016 interacts with HAKAI (no data on Biogrid) You can order your mutant from the stock center signal.salk.edu the same for Drosophila, mouse, worm etc. You can order your mutant from the stock center signal.salk.edu the same for Drosophila, mouse, worm etc.What to do if you cannot find insertion line for your gene? • RNAi/amiRNA (can be also ordered) • CRISPR You can order even various constructs regarding your gene from stock centers signal.salk.edu even basic fusions (GFP, myc, TAP etc.) often ready for you (in particular in human) You can order antibodies against your protein Plants – underdeveloped (Agrisera) several human proteins providers (mostly commercial): http://www.scbt.com/ www.acris-antibodies.com/ etc. - even get western and immunocytochemistry in advance Phenoscope PHENOSCOPE: an automated largescale phenotyping platform Thisne et al. 2013 Phenoscope Phenoscope • leaf area (camera) • photosynthesis (spectra) • weight • temperature (thermo camera) • in a dynamic manner • … • various ecotypes only, so far • commercially promising Check your phenotype online seedgenes.org – database of plant embryonic mutants (in-depth) http://rarge.psc.riken.jp/phenome/ - RIKEN Arabidopsis Phenome Information Database (kind of attempt on adult plant) 1000 genomes 1000 human genomes sequenced over the world (already history) wikipedia.org 1001 genomes - Arabidopsis http://1001genomes.org/ in both cases, much more lines already sequenced How the ecotypes are collected Olivier Loudet web page 1001 genomes user interface several single nucleotide polymorphisms (SNP) in the selected gene Tü-WHO (DE) Caucasus Cent. Asia S. Africa What could be natural variation good for? What could be natural variation good for? Quantitative trait loci (QTL) - nature makes genetic screen for you - QTL is analogous to gene in genetic screen Genome wide association studies (GWAS) Slovak et al. 2014, Busch lab, Vienna Genome wide association studies (GWAS) 163 accessions (ecotypes), several replicates (8 x 3) Slovak et al. 2014 searching for those different (say how different they might be!) Genome wide association studies (GWAS) 163 accessions (ecotypes), several replicates (8 x 3) Slovak et al. 2014 searching for those different (e. g. root growth, slim root, resistant to exogenous treatment) Genome wide association studies (GWAS) high p-value => SNP specifically in the “resistant” line (N-way ANOVA etc.) chromosome (locus) Genome wide association studies (GWAS) =CaS cas-1 mutant has indeed shorter root Slovak et al. 2014 In contrast to human: - how to test it? Genome wide association studies (GWAS) Manhattan plot by human Status of cytosine methylations in various tissues can be explored in various tissues (human) Epigenetic modifications How to find methylated bases in genome? Which bases are methylated? How to sequence methylation of genome? bisulfite sequencing cytosine 5-methylcytosine uracil non-methylated cytosine is converted to uracil What is methylation of cytosine good for? Are there other covalent modifications? MeRIP – detecting adenine methylation on RNA (m6A) >130 base modification detected in nucleic acids, incl. RNA N6-methyl adenosine most common in mRNA (0.5 – 5 % adenosines methylated) Similar technique also adapted on DNA in C. elegans (6mA) Greer et al. DNA Methylation on N6-Adenine in C. elegans, Cell 2015 HOT! Novel avenues in m6A sequencing HOT! Novel avenues in m6A sequencing even better one https://www.biorxiv.org/content/early/2017/04/13/127100 nanopore 6mA sequencing HOT! Novel avenues in m6A sequencing Safra et al. 2017 Presence of pseudouridine in mammalian mRNA highly dependent on method, lab, conditions... uridine pseudouridine The ENCODE project Is really only ~1 % human genome functional? The Encyclopedia of DNA Elements September 2012 1 % = gene coding regions ENCODE – think big • 80 million dollars (1/2 yearly GAČR budget) • 1,640 data sets • 147 cell types • Nature (6), Genome Biology (18), Genome Research (6 papers) The ENCODE project Mainly cancer cells, lymphocytes etc. RNA transcribed regions: RNA-seq, CAGE, RNA-PET and manual annotation Protein-coding regions: mass spectrometry Transcription-factor-binding sites: ChIP-seq, DNase-seq Chromatin structure: DNase-seq, FAIRE-seq, histone ChIP-seq and MNase-seq DNA methylation sites: RRBS assay (cheaper version of bisulfite seq) ENCODE - summary ~80 % genome associated with biochemical function: - enhancers, promoters - transcribed to non-coding RNA - 75 % genome transcribed, at least little bit - number of recognition sequences of DNA binding proteins doubled ModENCODE on the way http://www.modencode.org/ Adult eclosion + several days Adult female Adult male Embryos 0-1, 0-2, 0-12, 10-12 hr etc Larvae in various instars Pupae in various stages Mated males or females etc. Drosophila tissue sources: Question: where do you see the limits of high throughput biology? Cons - sometimes low quality data or artifacts - occasionally data missing - biological material is quite complex - what to do with so many data? - where is the idea? • next name for something between biology and chemistry? biochemistry -> proteomics molecular biology -> (functional) genomics • a real new concept? What is systems biology Inst. Plant Systems Biology, Gent, BE “Multidimensional biology”  Genomics  Epigenomics  Transcriptomics  Epitranscriptomics  Translatomics / Proteomics  Metabolomics  Interactomics  Fluxomics  NeuroElectroDynamics  Phenomics  Biomics Systems theory Forget about reductionism, think holistically. ὅλος [hol'-os] – greek. all, the whole, entire, complete Reductionism vs. holism Ludwig von Bertalanffy (1901-1972) Omics-revolution shifts paradigm to large systems - Integrative bioinformatics - (Network) modeling E. coli genome and proteome is small Reductionism within holism Lets e.g. assume that transcription and translation is one module. Conclusions – systems biology • computing capacities allow handling large data sets • fashionable • modelling whole cell processes in silico? • story frequently missing, there will be always question marks Great web sites for organismal models http://www.yeastgenome.org/ http://www.pombase.org/ http://flybase.org/ http://www.wormbase.org/ http://www.arabidopsis.org/ https://www.araport.org/ S. cerevisiae S. pombe Drosophila C. elegans A. thaliana Also nice web sites http://encodeproject.org/ http://www.thebiogrid.org/ http://www.genemania.org/ http://string-db.org/ …and many others …pay attention, if they are kept alive and curated Additional literature  Venter, J.C. (2008). A life decoded: my genome, my life (London: Penguin).  Albert-László Barabási (2005) V pavučině sítí. (Paseka) (znamenitá kniha o matematice sítí, dynamicky se rozvíjejícím oboru od předního světového vědce)  PA052 Úvod do systémové biologie, Přednášky. Fakulta Informatiky MU  http://www.youtube.com/watch?v=Z__BHVFP0Lk and further – excellent talks about systems biology from Uri Alon (Weizman Institute) – absolutely best  http://www.pnas.org/content/110/29/11952 (paper which challenges something conclusions in ENCODE) feel free to ask: kamil.ruzicka@ceitec.muni.cz