Interaktivní osnova
[Vlastimil Martinek]: Deep learning for genomic data 11. 11. 2021
Abstract
Biological sequences have been traditionally using rule-based methods and handcrafted features. Following the success of deep learning in natural language processing, the field of genomics is now adopting many similar techniques using deep learning. I will introduce various types of biological sequences, namely DNA, RNA, and Proteins. Further, we will talk about papers that successfully applied NLP methods to these sequences. These methods include ULMFit and transformer architectures. Finally, I will introduce the topics we are working on in the CEITEC Bioinformatics lab.
Readings
Strodthoff, N., Wagner, P., Wenzel, M., & Samek, W. (2019). UDSMProt: Universal Deep Sequence Models for Protein Classification. Cold Spring Harbor Laboratory. https://doi.org/10.1101/704874
Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118. https://doi.org/10.1073/pnas.2016239118