Machines Learning what makes Biology tick Panagiotis Alexiou CORE019 Pokroky a výzvy v moderní biologii (podzim 2021) Gregor Mendel: The Friar Who Grew Peas: Bardoe, Cheryl, Smith, Jos. A.: 9781419718403: Amazon.com: Books Francis Crick identified himself as a molecular biologist as a way of shortening his previous description of himself as "a mixture of a crystallographer, biophysicist, biochemist, and geneticist.” Arthur Samuel of IBM developed a computer program for playing checkers. The program used a scoring function to assess moves, and learned from previous games. 1962 4 5 Professor's perceptron paved the way for AI – 60 years too soon | Cornell Chronicle Rosenblatt's perceptron. | Download Scientific Diagram Rosenblatt's perceptron. | Download Scientific Diagram 1960s: The perceptron is able to solve simple problems such as linear regression. What is the AI winter? – TechTalks http://cyberneticzoo.com/wp-content/uploads/Minsky-Papert-71-cSolomon-x640.jpg Minsky Papert Non-linearly separable functions problem Briefing for US Vice President Gerald Ford in 1973 on the junction-grammar-based computer translation model Margaret Dayhoff Atlas of Protein Sequence and Structure. 1967-68: Dayhoff, Margaret and Richard Eck:, Dayhoff, Margaret and Richard Eck: Amazon.com: Books there is a tremendous amount of information regarding evolutionary history and biochemical function implicit in each sequence and the number of known sequences is growing explosively. We feel it is important to collect this significant information, correlate it into a unified whole and interpret it… Book of Protein Sequences Contained 65 protein sequences from various species Margaret Dayhoff Each protein sequence that is established, each evolutionary mechanism that is illuminated, each major innovation in phylogenetic history that is revealed will improve our understanding of the history of life Where did DNA sequencing begin? | Facts | yourgenome.org Where did DNA sequencing begin? | Facts | yourgenome.org Frederick Sanger https://miro.medium.com/max/1050/1*6U3DW1yCjdTOiKXNW0SUgQ.png https://i1.wp.com/sitn.hms.harvard.edu/wp-content/uploads/2017/08/Anyoha-SITN-Figure-2-AI-timeline- 2.jpg Yann LeCun Yoshua Bengio Geoffrey Hinton https://playground.tensorflow.org/ https://miro.medium.com/max/750/1*GcI7G-JLAQiEoCON7xFbhg.gif Yann LeCun Yoshua Bengio Geoffrey Hinton 18 19 The Long Short-Term Memory (LSTM) cell can process data sequentially and keep its hidden state through time. Long Short Term Memory LSTM Applications in Natural Language Processing and Translation Skládání proteinů – Wikipedie https://www.nobelprize.org/images/anfinsen-13232-content-portrait-mobile-tiny.jpg Nobel Prize in Literature - Wikipedia In theory, a protein’s amino acid sequence should fully determine its structure. File:CCDC138 primary protein sequence.png - Wikimedia Commons T1037, part of a protein from (Cellulophaga baltica crAss-like) phage phi14:2, a virus that infects bacteria. Given an amino-acid sequence predict protein structure Establishes ‘Protein Folding’ problem as holy grail of machine learning in biology Participants must blindly predict the structure of the proteins, and these predictions are subsequently compared to the ground truth experimental data when they become available. 1107E | Institute for Bioscience and Biotechnology Research John Moult Fig. 1 https://www.nature.com/articles/s41586-021-03819-2 https://lh3.googleusercontent.com/KKbgSsS1qIoesiy2Ws_WDsDSyGhTZgP9W3qZr-xS5ElnafEu80joptKmc2hgz01a6 j6yIj5cvCnqz8bBfXG8BND44ZKJ_kv7tTHQAA=w1440 Demis Hassabis on Twitter: "The #AlphaFold 2 papers on the methods and human proteome predictions are out today in hard copy in @Nature! A really proud moment to see our work featured After decades of effort, only ~18% of the total residues in human protein sequences are covered by experimentally determined structures at this time. Alphafold doubles this number overnight. In the near future, machine learning should be explored for predicting structures of protein–nucleic acid complexes... experimentally resolved protein–RNA complex structures remain low in number, and training sets are thus small, which may impair success at this time. Types of Medical Diagnostic Imaging Analysis by Deep Learning AI | by Vikram Singh Bisen | VSINGHBISEN | Medium 2021 3080 2020 2419 2019 1327 2018 602 2017 227 2016 62 2015 24 https://www.mdpi.com/cancers/cancers-13-04740/article_deploy/html/images/cancers-13-04740-g004.png Difficult circumstances might ensue in which a recommendation for treatment might be given in the absence of a well defined abnormality detected by routine imaging In April 2018, the US Food and Drug Administration approved the first AI-based diagnostic, IDx-DR, which detects diabetic retinopathy in people with diabetes by analyzing retinal images. Machine learning will soon be applied to many other medical conditions, from cardiology to neurodegenerative diseases and beyond… On balance, it is likely that more and more microcomputer-based medical expert systems will become available. One can already find surprisingly complex expert systems that run on a microcomputer, although the scope is usually narrow… Clinicians with an interest in expert systems should find that there are many opportunities to examine them through the increasing number of publications and conferences devoted to all facets of medicine and computing, including medical expert systems. This product has been CE marked as a Class I medical device in the EU. It is not available in the United States. Dotyk - Co prozradila DNA o původu Čechů: Nepocházíme z Evropy, pramatka byla Helena 34 GWAS Catalog on Twitter: "In 2001, #OnThisDay the first Human Genome was published in @nature and tomorrow in @sciencemagazine, so many advances since then and more to come!!… https://t.co/SsaoPYEd0r" The Human Genome Project: what was all the hype about? | Governing Emerging Technologies Cost ~ 300M USD 35 Illumina (Solexa) sequencing - GENOMICS 2006 – Solexa Genome Analyser 2007 – Solexa bought by Illumina Illumina Genome Analyzer sequencing. Adapter-modified, single-stranded... | Download Scientific Diagram $1,000 genome - Wikipedia Next Generation Sequencing New Generation Sequencing NGS Introduction to the Human Genome 2014 Alexei Fedorov 2015 JGI GOLD The Genomic Era (2000-) 37 38 Fig. 2 https://www.nature.com/articles/s43586-021-00018-1 No photo description available. ENCODE Project Writes Eulogy for Junk DNA 30 papers representing the integration and analysis of ENCODE data How much of our DNA is ‘junk’? Can we identify the location of functional genomic elements? OR https://www.pnas.org/content/pnas/111/17/6131/F2.large.jpg?width=800&height=600&carousel=1 UCLA Scientists Find 3000 New Genes in “Junk DNA” of Immune Stem Cells | The Stem Cellar CTGTGGTGCTCAACTGTGATTCCTTTTCACATTCACCCTGGATGTTCTCTTCACTGTGGGATGAGGTAGTAGGTTGTATAGTTTTAGGGTCACACCCAC CACTGGGAGATAACTATACAATCTACTGTCTTTCCTAACGTGATAGAAAAGTCTGCATCCAGGCGGTCTGATAGAAAGTCAGTTAACTAATTGTACAAT ATCTGTGGTGCTCAACTGTGATTCCTTTTCACCATTCACCCTGGATGTTCTCTTCACTGTGGGATGAGGTAGTAGGTTGTATAGTTTTAGGGTCACACC CACCACTGGGAGATAACTATACAATCTACTGTCTTTCCTAACGTGATAGAAAAGTCTGCATCCAGGCGGTCTGATAGAAAGTCAGTTAACTAATTGTAC AATA TCTGTGGTGCTCAACTGTGATTCCTTTTCACCATTCACCCTGGATGTTCTCTTCACTGTGGGATGAGGTAGTAGGTTGTATAGTTTTAGGGTCACACCC ACCACTGGGAGATAACTATACAATCTACTGTCTTTCCTAACGTGATAGAAAATGCA GTCTGCATCCAGGCGGTCTGATAGAAAGGG AGTCAGTTAACTAATTGTACAACTCCTTATAT ATATTCTGCATCCAGGCGGTCTCTTATAAGC CTGCATCCAGGCGGTCGCGGTAGTATTAGT TTAGGGTCATTAGGGTCAGTCCTATTAGTAC Find a hairpin in a junkyard [USEMAP] Integrated Gradients: highlight the “important” nucleotides RNA sequence 100nt Probability of G4 formation PENGUINN-RNA Genomic Annotation Benchmarks •Ready to use genomic classification datasets (cleaned, train/test split) •Get the benchmark to your machine with one line of Python code •Pre-trained models can be used for transfer learning Name Number of seqs Seq length Baseline model accuracy Human non-TATA promoters 36131 251 84.5% Human enhancers 28000 500 87.9% Coding vs. intergenic 100000 200 84.8% (bit.ly/genbench) https://lh4.googleusercontent.com/AFIicB-Xwvv9PiWBDNU9frg0jEjmNxELTZS052eDxYeEgoAxj57eQbuIp_OSjJjsY zCfhl0UfTQvm_WETzLFt2Uc6zeInEGZG2tpOZ6rDZ7N-m5baLxKi4W2uQZ6Axxyii0bdxhWz50=s0 Friday 19th November 2021: Hackathon! (in hybrid mode) – ALL are welcome – Email Panagiotis Alexiou for details https://bit.ly/ENNGene Grafika wektorowa We want you, obrazy wektorowe, We want you ilustracje i kliparty Genomic or Transcriptomic functional elements in need of identification RNA Binding Proteins Transcription Factor Binding Sites RNA Modification Sites Enhancers Small RNA Loci Non-coding RNAs miRNA targets Výzkumný institut CEITEC MU otevřel pavilony pro přírodní vědy a medicínu | CEITEC - výzkumné centrum Tomas Majtner Katarina Gresova Machines Learning what makes Biology tick Panagiotis Alexiou CEITEC-MU Brno, CZ Thank you for your attention! Panagiotis Alexiou Eliska Chalupova Ilektra Giassa Kriti Bhaghat Petr Simecek Eva Klimentova Jakub Polacek Ondrej Vaculik student postDoc NEW MEMBER Postdoc postDoc NEW MEMBER Group Leader PHD Student PHD Student NEW MEMBER PHD Student PHD Student student Vlastimil Martinek PHD Student NEW MEMBER PHD Student NEW MEMBER David Cechak